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Preface 


The history of mathematics, like the life of each individual mathematician, is a 
story that begins with concrete experience and (generally) ends at high levels 
of abstraction. A good example, which we follow in this book, is the story 
of arithmetic. It begins with counting, then adding and multiplying; then it 
symbolizes this experience in equations. Next, it investigates equations via the 
abstract structures of groups, rings, and fields, and so on, to higher and higher 
levels of abstraction. This is a typical story, but the story alone does not explain 
why abstraction is necessary — or why it ever happened at all. 

The reason is that abstract structures distill the essence of many concrete 
structures, enabling us to see past a mass of distracting details. For example, 
it is an impossible task to list all the facts about addition and multiplication 
of numbers, and some specific questions about them were not answered for 
hundreds of years. Mathematicians have been able to answer some of the hard 
questions only by working with abstract concepts that encapsulate the nature 
of addition and multiplication. 

The art of algebra is the art of abstraction: choosing concepts that distill the 
essence of questions that interest us. To some extent the proof that we have 
chosen the “right” concepts is in the pudding. The right concepts answer many 
questions and make the answers seem obvious. But a concept may be “right” 
in the sharper sense that we can prove it is a necessary part of the answer. That 
is, an answer or solution exists only in structures that exemplify the concept in 
question. 

A famous example is the discovery by Galois of the group concept, 
which explains which polynomial equations have solutions by radicals. Galois 
associated a group — now called the Galois group — with each equation and 
showed that an equation is solvable by radicals if and only if its Galois group 


xi 
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has a certain property, now called solvability. Thus the concept of solvable 
group is the “right” concept to explain solvability of equations. 

In this book we study a second famous example: Dedekind’s theory of 
rings and ideals, which explains the phenomenon of unique prime factorization 
in arithmetic and its generalizations. Again, there is an abstract algebraic 
concept — now called a Dedekind domain — that exactly captures the property 
of unique prime factorization. Dedekind domains are an equally good example 
of the power of abstraction, and in some ways easier than the group concept, 
since their algebra is commutative. They also have a natural motivation as an 
outgrowth of arithmetic — which is why our path starts with Euclid. 

The material in the book may be found in comprehensive graduate algebra 
texts, such as Zariski and Samuel (1958), Jacobson (1985), and Rotman (2015), 
but it is hard work to extract it from them. I prefer not to be comprehensive, 
so as to tell the story with only the essential abstractions, and to make it 
sufficiently self-contained to be accessible to undergraduates. This means 
including enough number theory to motivate the problem of unique prime 
factorization, which we do in the first three chapters. These chapters introduce 
algebraic numbers to solve classical equations such as the Pell equation, and 
the concepts of ring and field that abstract the algebra of these numbers. 

Accessibility to undergraduates, in my opinion, also means including the 
linear algebra needed to view number fields and number rings as vector spaces 
and modules. I realize that this opinion is somewhat controversial. Modern 
books on algebraic number theory commonly assume linear algebra is already 
known, and indeed, every undergraduate takes a course in linear algebra these 
days. But linear algebra is a multifaceted subject, and I doubt that many 
undergraduates know the subject from the viewpoint needed here, which varies 
the base field (or base ring) and relies on the trace, determinant, characteristic 
polynomial, and discriminant. Those who do may skip the parts where these 
topics are covered, but I believe they should at least be skimmed in order to 
see where linear algebra fits in the bigger algebraic picture. 

In fact the book closest to this one could be the classic telling of the story 
by Dedekind in 1877, which may be seen in English translation in Dedekind 
(1996). Dedekind’s account is at a lower level of generality than ours, being 
concerned only with the needs of number theory, but it follows a similar 
path. The advantage of raising the level of generality is that one sees how 
close Dedekind came to the ultimate setting for unique prime factorization. As 
Emmy Noether used to say: “Es steht alles schon bei Dedekind.” (“Everything 
is already in Dedekind.”) 
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I should say, however, that I raise the level of generality only in easy stages, 
when it becomes necessary. As in the history of the subject, the general case 
appears only after the important special cases. 

To make the book useful to undergraduates and instructors, I have included 
many exercises, distributed in small batches at the end of most sections. 
These range from routine exercises, which test and reinforce understanding 
of new concepts, to exercise “packages” leading to substantial theorems. These 
theorems are often concrete consequences of the abstract machinery developed 
in the main text. The aim of each “package” is to reach an interesting goal by 
a sequence of easy steps, so the exercises include commentary to explain what 
the goal is and (in some cases) where to look for help later in the book. 

Although many important and useful results occur in exercises, it should be 
stressed that these results are not assumed in the main text. In a few cases they 
are later used in the main text, but only after the main text has proved them. 

In fact, the technical prerequisites for this book are small, since the whole 
point is to grow a big abstract structure from ideas in arithmetic. High 
school algebra should suffice, if it includes the matrix concept, and otherwise 
undergraduate linear algebra as far as matrices. Apart from these technical 
skills, however, the reader will also need sufficient mathematical maturity to 
be comfortable with abstractions. In most cases this will mean a couple of 
years of undergraduate mathematics, even a first course in abstract algebra. 
This book carries commutative algebra far beyond the typical first course, but 
it certainly will not hurt to have a first impression of fields, rings, and ideals. 


https://doi.org/10.1017/97810090041 38.001 Published online by Cambridge University Press 


Acknowledgments 


As usual, my greatest thanks go to my wife Elaine, who did the first round 
of proofreading and picked up many errors. Others were found by Mark 
Hunacek, Paul Stanford, and an anonymous reviewer, who also made valuable 
suggestions that clarified several points. I also thank the University of San 
Francisco and the DPMMS at the University of Cambridge for support during 
the writing of the book. 


XiV 


Published online by Cambridge University Press 


1 


Euclidean Arithmetic 


Preview 


Euclid’s Elements, from around 300 BCE, is the source of many basic parts of 
modern mathematics, such as geometry, the axiomatic method, and the theory 
of real numbers. It is also the source of arithmetic as mathematicians know it: 
the theory of addition and multiplication of natural numbers, with emphasis on 
the concepts of divisibility and primes. 

For Euclid, a natural number b is a divisor of a natural number a if 


a=bc_ for some natural number c. 


Then a natural number p > | is prime if its only divisors are itself and 1. 
These concepts lead, as Euclid showed by a short but ingenious proof, to the 
discovery that there are infinitely many primes. 

Even more ingeniously, Euclid proved the prime divisor property: If a 
prime p divides a product ab, then p divides a or p divides b. His proof is 
based on the famous Euclidean algorithm for finding the greatest common 
divisor of two natural numbers. The prime divisor property easily implies 
what we now call the fundamental theorem of arithmetic, or unique prime 
factorization: Every natural number greater than 1 may be expressed uniquely 
(up to the order of factors) as a product of primes. 

Unique prime factorization is so useful that mathematicians would like it 
to hold wherever the concept of “factorization” makes sense. In fact, as we 
will see in later chapters, even when it is lost they will try to recover it. In 
this chapter we prepare to explore more general domains for factorization by 
introducing the concepts (and some examples) of ring and field. 
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1.1 Divisors and Primes 


In this chapter we will be working mainly with the set N = {0, 1,2,3,4,5,...} 
of natural numbers. These are the numbers obtained from 0 by “counting”: 
that is, by repeatedly adding 1. It follows (informally) that from any natural 
number 7 we can reach 0 in a finite number of steps by “counting backwards,” 
and hence that any set of natural numbers has a least member. Since Euclid, 
this so-called well-ordering property of N has been the basis of virtually 
all reasoning about the natural numbers, so it is usually taken as an axiom. 
In this section we will use it, as Euclid did, to prove results about divisibility 
and primes. 

We have already said what it means for a natural number b to divide a 
natural number a; namely, a = bc for some natural number c. So if b does not 
divide a, we necessarily have, for any natural number g, 


a=bq+r, withr > 0. 


When r is least possible, we call g the quotient (of a by b) and r the 
remainder. It then follows that 0 < r < b, because ifr = b+ r’, we would 
have 


a=b(q+1)+r’, contrary to the assumption that r is the least remainder. 


The two cases, where b does and does not divide a, can be combined in 
the following division property: For any natural numbers, there are natural 
numbers qg andr such that 


a=bq+r, where 0<r<b. (*) 


This property is often misleadingly called the “division algorithm.” (It is not 
an algorithm, but it paves the way for the very important Euclidean algorithm, 
as we will see in the next section.) Finding the quotient and remainder for a 
given pair a,b is called division with remainder. 

Another easy application of well-ordering of N tells us that every natural 
number greater than I is divisible by a prime. Start with any natural number 
a > 1. Ifa is not prime, then a = bc for some smaller numbers b and c. Then if 
b is not prime, we have b = de for some smaller natural numbers d and e, and 
so on. Since natural numbers cannot decrease forever, this process must halt — 
necessarily with a prime p that divides a. It follows, by repeatedly finding 
prime divisors, that every natural number has a prime factorization. 

With these easy properties of divisors and primes, we are now ready for 
something ingenious: Euclid’s proof that there are infinitely many primes. 
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Infinitude of primes. For any prime numbers p,, pz, ..., px, there is a prime 
number py+1 # Pi, P2,--+>Dk- 
Proof. Consider the number N = (p,- p2--- px) + 1. None of pj, po, ..., Pe 


divide N because they each leave remainder 1. But some prime divides N 
because N > 1. This prime is the pz we seek. Oo 


The beauty of this proof is that it avoids having to find any pattern in the 
sequence of primes, or finding divisors of a number, both of which are hard 
problems. 


1.1.1 The Euclidean Algorithm 


Although it is hard to find the divisors of a given (large) natural number, 
it is surprisingly quick and easy to find common divisors of two natural 
numbers. This can be done by the Euclidean algorithm for finding the 
greatest common divisor gcd(a, b) of two natural numbers a and b. As Euclid 
described it, (Elements, Book VII, Proposition 1) the algorithm “repeatedly 
subtracts the lesser number from the greater.’ More formally, it repeatedly 
replaces the pair {a,b}, where a > b, by the pair {b,a — b} until the members 
of the pair become equal — at which stage each member is gcd(a, b). 

For example, if we begin with the pair {34,21}, the pairs produced by the 
algorithm are the following 


{34,21} — {21,13} — {13,8} — {8,5} > {5,3} > {3,2} > {2,1} > {1 lh. 


And we conclude that gcd(34, 21) = 1. 
In general, the correctness of the Euclidean algorithm is guaranteed by the 
following theorem. 


Euclidean algorithm produces the gcd. /f the Euclidean algorithm is 
applied to two natural numbers a,b > 0, then it terminates in a finite number 
of steps with the pair whose members are both gcd(a, b). 


Proof. Suppose that d is any common divisor of a and b, where a > b. This 
means that a = a’d and b = b'd for some a’,b’ > 0, and hence that 


a—b=(a' —Db’)d. 


Thus, d is also a divisor of a — b. There is a similar proof that any common 
divisor of two numbers is also a divisor of their sum, so a divisor of b and a—b 
is also a divisor of b + (a — b) = a. It follows that each pair produced by the 
Euclidean algorithm has the same common divisors, and hence the same gcd. 
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Now, as long as the pairs produced by the algorithm are unequal, subtraction 
occurs, and it will decrease the sum of the two members of the pair. By 
the well-ordering of N, the sum cannot decrease forever, so the algorithm 
necessarily halts with a pair of equal numbers. Being equal, they equal their 
own gcd; hence they each equal gcd(a, b). Oo 


In practice it is usual to speed up the Euclidean algorithm by doing division 
with remainder instead of subtraction. That is, we replace the pair {a,b}, 
where a > b, with the pair {b,r}, where r is the remainder when a is divided 
by b. This process is simply a shortening of repeated subtraction, because r 
can be found by subtracting b repeatedly from a. However, the usual “long 
division” process generally finds r more quickly than repeated subtraction. 

In fact, by using division with remainder, we can be sure that the number of 
steps required for the Euclidean algorithm to halt is roughly proportional to the 
number of decimal digits in a. The example above, incidentally, is one where 
each division with remainder is actually the same as a single subtraction. This 
happens whenever a and b are a pair of consecutive Fibonacci numbers: the 
numbers 0, 1, 1,2, 3,5, 8, 13,21, 34,55, 89, 144, 233, ... defined by 


Fo = 0, Fn42 = Fait Fn. 


This is the case where the Euclidean algorithm runs most slowly. But even 
here, the number of steps is roughly proportional to the number of decimal 
digits. 


Exercises 


1. Explain why the Euclidean algorithm, applied to the pair { Fn+2, Fn+1}, 
yields all preceding pairs of consecutive Fibonacci numbers. 
2. Deduce that gcd(Fy+2, Fn41) = 1. 


Division with remainder is the preferred way to run the Euclidean algorithm 
in practice, because it is generally faster. But it also has advantages in theory, 
since it applies in situations (such as division of polynomials) where division 
with remainder is not achievable by repeated subtraction. In the case of 
ordinary positive integers a, b, the process of repeated division with remainder 
can be elegantly “frozen in time” by the so-called continued fraction for a/b. 

Given positive integers a > b, the continued fraction process finds q; > 0 
and r; > 0 (“quotient” and “remainder’’) such that a = bq, +r; withr; < ), 
and we write down the equivalent equation 

a 


po 
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If r; = 0, then the process ends there, because we have found that b divides a 
and hence that gcd(a, b) = b. 

Ifr; > 0, then we rewrite the above equation as 
a 
_=@ 


5 oy, 


and repeat the process on the fraction b/r; (which we can do since b > r; > 0). 
In this way we can simulate the action of the Euclidean algorithm on a pair 
(a, b) by the process of “continuing” a fraction a/b. 


3. Explain why the continued fraction process terminates for any positive 
integers a,b. 
4. Applying the continued fraction process to 23 and 5, show that 


io 
a) 


Division with remainder also has a neat representation by 2 x 2 matrices, in 
which division with remainder corresponds to extracting a matrix factor from a 
column vector. In this setup, the pair {a,b} is represented by the column vector 


(5) , wWherea > b. 


5. Ifa = qib +1, show that (5) — G 7 (”). 


Then, if b > r; # 0, one can repeat the process on the column vector (’ ) 
ry 
: : 23 4 1\/f1 1\/1 1)\/2 1)\/1 
6. Show in particular that (%) = 6 s) ({ | 6 s) (; s) (i) 


1.2 The Form of the gcd 


The correctness of the Euclidean algorithm says that gcd(a, b) results from the 
pair {a,b} by repeated subtraction. This implies that gcd(a, b) has a very simple 
symbolic form. Because subtraction is involved, the form involves integers; 
that is, natural numbers and their negatives. The system of integers is denoted 
by Z, from the German word “Zahlen” for numbers. 
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Form of the ged. For any natural numbers a,b > 0, there are m,n € Z such 
that 


gcd(a,b) = ma + nb. 


Proof. We show in fact that the numbers produced from a,b at each step of 
the Euclidean algorithm are of the form ma + nb. This is certainly true at the 
beginning, wherea = 1-a+0-bandb=0-a+1-b. 

And if the pair at some stage is {mia + n1b,m2a + n2b}, then the pair at 
the next stage is {m2a + n2b, (m, — m2)a + (n1 — n2)b}, which again consists 
of numbers of the required form. 

Thus, the numbers at all stages are of the form ma + nb. In particular, this 
is true at the last stage, when each number is gcd(a, b). Oo 


Given a pair of moderately sized numbers a, b (say, two-digit numbers), it 
may be hard to spot m and n such that gcd(a,b) = ma-+nb. However, m and n 
are easily computed by running the Euclidean algorithm on the letters a and b, 
doing exactly the same subtractions on the symbolic forms that we originally 
did on numbers. For example, here is what happens when we run the numerical 
and symbolic computations side by side in the case where a = 34 and b = 21. 


{34,21} {a,b} 
> {21,34—21} = {21,13} > {b,a—5} 
—> {13,21 — 13} = {13,8} > {a—b,b—(a—b)} = {a—b, —a+2b} 
— {8,13 — 8} = {8,5} > {-a+ 2b,a — b — (—a + 2b)} = {—a + 2b, 2a — 3b} 
—> {5,8 — 5} = {5,3} > {2a — 3b, —a+ 2b — (2a — 3b)} = {2a — 3b, — 3a + 5b} 
—> {3,5 —3} = {3,2} > {-3a + 5b,2a — 3b — (—3a + 5b)} = {—3a + 5b, 5a — 8b} 
> (23-2 = {2,1} > {5a — 8b, —3a+ 5b — (Sa — 8b)} = {5a — 8b, — 8a + 130}. 


From the last line we read off 1 = gcd(a,b) = —8a + 13b, and it can be 
checked that indeed | = —8 - 34+ 13-21. 

The symbolic form of the Euclidean algorithm, and hence of the gcd, 
was not known to Euclid. Indeed, written calculation with numbers did not 
develop until centuries after him, because numerical calculation could be done 
perfectly well with the abacus. And it was not until the sixteenth century that 
mathematicians realized that written calculation with symbols (“algebra”) was 
a powerful idea — in fact more powerful than written calculation with numbers. 
Still, even with the primitive notation at his disposal, Euclid was able to prove 
the prime divisor property, the main result of the next section. 


1.2.1 Linear Diophantine Equations 


The equation ax + by = c, where a,b,c are integers, becomes interesting 
when integer solutions for x and y are sought. The equation obviously has no 
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such solution when gcd(a,b) does not divide c, because in that case gcd(a, b) 
divides ax + by but not c. However, this is the only obstruction. 


Criterion for solvability. [f gcd(a,b) divides c, then ax +by =c has an 
integer solution. 


Proof. It follows from the above that gcd(a,b) = ma + nb for some integers 
m and n. Then, if c = d - gcd(a,b), it follows that ax + by = c forx = dm 
and y = dn. Oo 


This criterion for solvability generalizes to linear equations in more than 
two variables. For example, ax + by + cz = d has an integer solution & 
gcd(a, b,c) divides d. The (=>) direction is clear, for the same reason as above. 
The (<) direction holds because 


gcd(a,b,c) =la+mb-+nc_ for some integers /,m,n, 


which follows from the above because gcd(a,b,c) = gcd(gced(a, b),c). 

We also know that we can find the required m,n for gcd(a,b) by the 
extended Euclidean algorithm described above. Finally, we can find all 
solutions of ax + by = c by adding to any single solution the solutions of 
ax + by = 0, which are x = kb/ gcd(a,b), y = —ka/ gcd(a,b) for all inte- 
gers k. 

With these observations we can move on to Diophantine equations of higher 
degree. We begin in Section 1.5 with a quadratic equation in two variables. 
Other examples, of degree 2 and 3, are discussed in the next chapter. But first, 
let us see what the gcd can tell us about prime numbers. 


Exercises 


1. Using the symbolic Euclidean algorithm above, find integers m,n such 
that 13m + 17n = 1. 


The matrix version of division with remainder, explored in the previous set 
of exercises, can be very elegantly “inverted” to give the integers m and n 
such that gcd(a,b) = ma + nb. Recall that a = qib+ r; is represented by the 


matrix equation 
a\ (ga 1 b 
b) \l OJ \n/ 


2. Show that if repeated division with remainder on the pair a,b produces 
successive quotients g1,q2,...,g, and gcd(a, b) = d, then 


Or Ga) Gece la! 
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3. Deduce that 
and show that 


4. Deduce from exercise 6 of Section 1.1 that 


1) (0 1 0 1 0 1 0 1 23 
oJ da -2/\) -17/\1 -17/ 1 -4)/\5/)° 
and hence express the gcd of 23 and 5 in the form 23m + 5n. 


Another way to prove gcd(a,b) = ma + nb is by considering the smallest 
positive value c of ma + nb for m,n € Z. This idea will be used in Section 5.2 
to prove that Z is a principal ideal domain. 


5. Show that all values of ma + nb are multiples of c (this part uses the 
division property of Z). 

6. Deduce that c divides a and b, and that any divisor of a and b divides c. 

7. Conclude that c = gced(a,b). 


1.3 The Prime Divisor Property 


The relevance of the Euclidean algorithm to the theory of primes becomes clear 
when we consider gcd(a, p), where p is prime. If p does not divide a, then we 
must have gcd(a, p) = 1, because the only divisors of p are | and p itself. 
This leads to a crucial result. 


Prime divisor property. [fa and b are natural numbers and p is a prime that 
divides ab, then p divides a or p divides b. 


Proof. Suppose that p does not divide a, so we must prove that p divides b. 
First, as we have just remarked, gcd(a, p) = 1. Also, as we saw in the previous 
section, gcd(a, p) = ma + np for some integers m and n, so 


1=ma-+np_ for some integers m and n. 
Multiplying both sides of this equation by b, we get 


b=mab+npb for some integers m and n. 
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Since p divides ab by hypothesis and p divides pb, obviously, b is a sum of 
terms divisible by p. Hence, b itself is divisible by p. oO 


In proving this prime divisor property, Euclid came as close as he probably 
could (given his poor notational resources) to proving what we now call the 
fundamental theorem of arithmetic, or unique prime factorization. Unique 
prime factorization easily follows from the prime divisor property if one has 
notation for arbitrary products of primes. 


Unique prime factorization. [f pj, p2,..., px and q1,q2,-.--,q1 are prime 
numbers such that 


PiP2°** Pk = 9192°°° 1, 
then the same factors occur on each side, perhaps in a different order. 


Proof. Since p, divides the left side of the equation, it also divides the right 
side, hence, it divides one of the factors g; by the prime divisor property. 
It follows that p; = qj, and we may cancel p; and q; from the equation. 
Repeating the argument with the factors that remain, we eventually find that 
each p; equals some gx, and vice versa, so the factors on each side are exactly 
the same, though perhaps in a different order. oO 


We sometimes express this theorem by saying that factorization of a natural 
number greater than | into primes is unique “up to the order of factors.” 
Later, we will see many other statements of unique prime factorization, and 
the “uniqueness“ will be “up to order’ and sometimes other trivial variations. 
For example, prime factorization of integers is unique not only “up to order” 
but also “up to sign” because, for example, 6 = 2-3 = (—2) - (—3). 

The next section gives some applications of unique prime factorization. Due 
to its usefulness and simplicity, unique prime factorization has been sought in 
many other domains where “factorization” makes sense. In fact, a major theme 
of this book is the search for appropriate concepts of “prime” in domains where 
the obvious kind of factorization fails to be unique. 


Exercises 


In school you may have used prime factorization to find the gcd (“greatest 
common divisor’) and the lcm (“least common multiple”) of given positive 
integers. We can justify this idea with the help of unique prime factorization. 


1. Find gcd of 60 and 84 by finding the common primes in their prime 
factorizations. 
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2. Also find Icm(60, 84). 


3. Given that p;,..., px are the primes in the factorizations of a and b, so 
a= ae a py, and 
= p}'---p,, for some integers mj,nj,...,mx,ne = 0, 


explain why 


gcd(a,b) = prin) ... pmin@resne) 
max(m),71) max(mg,ng) 
+ Dy : 


Icm(a,b) = p, 


4. Use these formulas for gcd and Icm to prove gcd(a, b)lcm(a,b) = ab. 


Our proof of unique prime factorization in this section comes from the 
division property of Z, via the prime divisor property. In the exercises to the 
last section we showed that the division property also implies the principal 
ideal property of Z, according to which the numbers of the form ma + nb 
are all multiples of a certain nonzero member c. We can also prove the prime 
divisor property from the principal ideal property, as the following exercises 
show. 


5. Suppose that p divides ab, but p does not divide a. Given that the numbers 
of the form mp + na are all multiples of some positive c # 0, show that 
c=1. 

6. Now deduce the prime divisor property. 


1.4 Irrational Numbers 


The numbers considered so far are the natural numbers and their close relatives 
the integers. A still larger class whose properties derive from those of the 
integers is the set Q of rational numbers: the ratios, or quotients, m/n of 
integers m,n with n # 0. It was once thought that all numbers are rational, 
but that hope was dashed (and serious mathematics began) when one of the 
followers of Pythagoras discovered that ./2 is not. This discovery shocked the 
Pythagoreans, who sought a “rational” (number) explanation of everything, but 
who also knew that \/2 was a fundamental quantity in geometry — the diagonal 
of the unit square. We give a proof of the irrationality of /2 by a method that 
extends to many other numbers. 


Irrationality of 2. For any natural numbers m and n, m/n # V2. 
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Proof. Suppose on the contrary that/2 = m/n or, equivalently, that 2n* = m? 


for some natural numbers m and n # 0. Consider the prime factorizations of 
2n2 and m2, which must be the same up to the order of factors. 

Now the number of occurrences of the prime factor 2 in 2n* must be odd; 
namely the one visible occurrence, plus twice the number of occurrences inn. 
But the number of occurrences of 2 in m2 must be even; namely, twice the 
number of occurrences in m. 

This contradicts unique prime factorization, hence, there are no natural 


numbers m and n such that /2 = m/n. oO 


The method of comparing numbers of particular primes on both sides of a 
hypothetical equation extends to prove irrationality of many square roots — 
in fact, it applies to the square root of any natural number that is not a 
perfect square. For example, 6 # m/n because the equation 6n? = m?, 
or 2 - 3n” = m?, is impossible—there are an odd number of factors 2 on the 
left side, but an even number on the right. Once again, this contradicts unique 
prime factorization. 

The method also extends to cube roots, fourth roots, and so on. For example, 
J2#m /n because the equivalent equation 2n? = m? contradicts unique 
prime factorization. The number of factors of 2 in m? is a multiple of 3 (three 
times the number of 2s in the prime factorization of m), whereas the number 
of factors of 2 in 2n? is one (the visible 2) plus a multiple of 3 (three times the 
number of 2s in the prime factorization of 7). 

Thus, irrationality is common, but we will soon see it is not a purely 
negative property. It is in fact useful and interesting. It allows us to create 
classes of irrational “integers” with properties similar to those of the ordinary 
integers. These so-called algebraic integers — a term we will later define in 
general — are useful because they allow us to split rational expressions, such 
as x7 — 2y?, into irrational factors that are more manageable. An example is 
worked out in the next section. 


Exercises 


The Euclidean algorithm of Section 1.1 (repeated subtraction) also applies 
to a pair of numbers whose ratio is irrational, in which case the algorithm 
does not terminate. Indeed, Euclid stated nontermination of the algorithm as a 
criterion for irrationality in his Elements, Book X, Proposition 2. If we apply 
the Euclidean algorithm to the pair (V2, 1) we find a new and enlightening 
proof that ./2 is irrational. Here it will be convenient to work with ordered 
pairs (a,b) in which a > b. 
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1. Check that the first two subtractions for the pair (V2, 1) yield first the pair 
(1, f= 1) and then the pair (2 — J2, /2— 1). 

2. Show that the latter pair are in the same ratio as the first pair, and deduce 
that the Euclidean algorithm does not terminate. 

3. Explain, on the other hand, why the algorithm terminates on any pair of 
numbers whose ratio is rational. 


Nontermination of the Euclidean algorithm on the pair (V2, 1) is clear 
because the process is periodic; the same ratio recurs every other step. This 
periodicity is reflected in the infinite continued fraction for 2. 


_ = = 1 
4. Show that /2+1=2+(V¥2—1)=2+ ao. 


5. By repeatedly replacing the denominator 2 + | on the right-hand side of 
this equation by the whole right-hand side, show that 


1 
J24+1=2+4+ ; 
2+ 
1 
2+ 
2+— 
and deduce that 
1 
fF = 1+ i 
2+ 
1 
2+ 
1 
2 — 


If one accepts that infinite continued fractions are meaningful (and they 
are!) then the latter expression is perhaps the clearest possible view of the 
irrationality of /2. Any irrational number is infinite in some sense, but a 
periodic infinity — the same thing over and over — is the easiest kind to grasp. 


6. If one terminates the continued fraction ./2 at its various levels, one 
obtains a sequence of fractions that converge to \/2. Check that the first 
few fractions in this sequence are 


a2 3 7 AT 
es ae ee 


It happens that every other pair x/y in this sequence satisfies x7 — 2y* = 1. 
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1.5 The Equation x? — 2y? = 1 


Trial and error shows that the smallest positive integer solution of x7 —2y” = 1 
is x = 3, y = 2. We also notice that 1 = 32 — 2.27, viewed as a difference of 
two squares, has an irrational factorization 


1 = (3-2¥V2)(3+2¥2). 
Raising this equation to the nth power gives the equation 
1=1" = (3-2¥2)"(3+2¥V2)". (*) 


We now invoke a property of the numbers a + b V2, for integers a and b, 
that holds only because 2 is irrational. Namely, the number a + b/2 
“behaves the same” as its conjugate a — b V2 in the following sense: 


Multiplicative property of conjugation. The conjugate of a product is the 
product of the conjugates. 


Proof. Consider the product, for integers a, b,c,d, 
(a+bV2)(c +d V2) =ac + 2bd + (ad + be) V2. 


Its conjugate is ac + 2bd — (ad + bc) V2, which indeed is the product of the 
conjugates, (a — b /2)(c —d V2). o 

It follows from this property that, if x,, y, are the unique integers such that 
(3 —2 V2)" = Xn — yn V2, then the product (3 +2 V2)" of conjugates of 
factors on the left side equals the conjugate of the right side, x» + yn /2. We 
can now carry on from equation (*) as follows. 

1 = (3-2¥2)"(3+2Vv2)” 
= (Xn — Yn V2) (Xn + Yn V2) 


ee 2 
= X_ 2Yn- 


Thus, the integers xn, yy defined by (3 —2 2)" = Xn— Yn /2 are solutions of 
the equation x? — 2y* = 1 for all natural numbers n. With a little more work, 
it can be shown that, apart from solutions obtained by changing the sign of xy, 
or yp, these are the only integer solutions. 

The conjugation operation that unlocks these solutions is a property enjoyed 
by the numbers a + b/2 that give them “more structure” than ordinary 
integers. In some sense, the visible structure of the numbers a + b V2 reveals 
“hidden structure” in the ordinary integers. We will see more examples of this 
phenomenon in the next chapter. The numbers a + b V2 are among those we 
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call algebraic integers, because they share many properties with the ordinary 
integers. We list the most fundamental shared properties in the next section. 


Exercises 


The idea of encoding a solution x = a, y = b of the equation x7 — 2y* = 1 
by the algebraic integer a + b /2 greatly clarifies the set of all solutions. It 
enables us to show that the solutions x = x,, y = y, found above are in fact 
all the solutions (up to + signs). 


1. Show that if aj + b; /2 and an + bo /2 encode solutions, then so does 
their product (a +b, V2) (a2 + bo V2). 
2. Show that if a + b /2 encodes a solution, so too does (a +b V2). 


It follows from these two exercises that we can speak of the product and inverse 
of ordered pairs (a,b) such that a — 2b* = 1; namely, by taking product and 
inverse of their “proxies” a + b J/2. Next, we divide the solution pairs (a,b) 
into classes: the positive pairs, for which a > 0, and the negative pairs, for 
which a < 0. We now concentrate on the positive pairs, since negative pairs 
are easily recovered from the positive pairs. 


3. Show that if (a, b) is a positive pair, then a + b V2 is positive, and so are 
the product and inverse of positive pairs. 


By taking logarithms of the positive numbers a + b V2, we can now talk about 
sum and negative of positive solution pairs (a,b). This view has the advantage 
of generating proxies of solution pairs by simple addition, and of ordering 
solution pairs by the size of the numbers log (a +b V2). 


4. Show that the numbers log (a +b V2), for positive solution pairs (a, b), 
include the numbers log (Xn + Yn V2) =n log (3 +2 V2) for each 
integer n. 


The question that remains is whether the solution pairs (a, b) include any other 
than those already found. This is where it helps to view log (a +b /2) as the 
size of the solution x =a, y = b. 


5. Suppose that (a,b) is a positive solution pair not equal to any pair (Xn, yn), 
and suppose that (x, ym) is the solution pair “nearest” to (a,b) in the 
sense that | log (Xm + Yn V2) — log (a +b J/2)| is as small as possible. 
Deduce that there is a positive solution pair (a’,b’) such that 


0< log (a’ +b’ V2) < log (3 +22), 


which is a contradiction. 
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1.6 Rings 


The system Z = {...,—2,—1,0,1,2,3,4,...} of integers is the motivating 
example for the concept of ring, a set of objects with sum and product 
operations that share the following properties (also called the ring axioms) 
with Z: 


a+b=b+a ab = ba (commutative laws) 
a+(b+c)=(a+b)+c a(bc) = (ab)c (associative laws) 
a+0O=a a-l=a (identity laws) 
a+(-a)=0 (inverse law) 
a(b+c) =ab+ac (distributive law) 


Thus, in a ring R we can add, subtract (by defining a — b asa + (—b)), and 
multiply using the usual rules for calculation with whole numbers.! We can 
also define the concept of divisibility by saying that 


b divides a © a = bc for some c € R. 


In fact, much of ring theory is motivated by trying to carry over Euclid’s 
theory of primes to rings in general. This is not completely straightforward. 
For example, we might try to define a prime to be an element divisible only by 
itself and 1, but we know that even in Z, a prime p, is divisible by +1 and +p, 
due to the fact that —1 divides |. Thus, the theory of divisibility is complicated 
by elements that divide 1, which are called units. 

An example with infinitely many units is the ring 


Z[V2] = {a+bV2:a,b€Z} 


that we used to find integer solutions of x* — 2y? = 1 in the previous section. 
The numbers x, — yn /2 defined there are units of Z[ V2], because each of 
them divides 1; in fact 


1 = (Xn — Yn V2) (xn + Yn V2). 


Another complication is that unique prime factorization may fail, even with 
the usual proviso “up to unit factors.” We will see an example in Chapter 2. 
An important example where it does not fail is the polynomial ring Q[x] 
of polynomials in x with rational coefficients. It is clear that Q[x] is a ring 


! The “usual rules” include many things that are not in the list of properties above. Indeed, the 
ring axioms have been chosen to be minimal, and some well-known properties can be derived 
from them only with some ingenuity. These properties include a - 0 = 0 and (—1)- (-1) = 1, 
which are posed as problems in Exercise | below. Also, it should be mentioned that our concept 
of ring is sometimes called a commutative ring. There is a more general concept of “ring” that 
allows noncommutative multiplication. 
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Figure 1.1 Simon Stevin (1548-1620). Licensed under Creative Commons 
Attribution-ShareAlike 4.0 International License. 


under the usual sum and product of polynomials, and that any nonzero rational 
number is a unit. However, Q[x] has the division property in the following 
form: 


Division property for polynomials. [f a(x) and b(x) # 0 are in Q[x], then 
there are polynomials q(x) and r(x) in Q{x] such that 


a(x) = b(x)q(x) +r(x) where degree(r) < degree(b). 


Proof. This property falls out of the usual “long division” process for polyno- 
mials. If degree(a) < degree(b), we can take g(x) = 0 and r(x) = a(x), and 
we already have degree(r) < degree(b). 

If a(x) = ayx” + ay_yx"—! +--+ +9 and D(x) = byx™ + Dy 1x"! 
+.---+ bo andn > m, we begin by subtracting px" d(x) from a(x) to 
remove the term of highest degree in a(x). Then we similarly remove the term 
of highest degree in a(x) — pax" ™ D(x), and so on. This can continue as far 
as the degree m term in a(x), at which stage we have subtracted a polyno- 
mial g(x)b(x) from a(x) and the remainder r(x) has degree less? than that 
of b(x). Oo 


It was observed as early as 1585, by Simon Stevin, that this property gives 
a Euclidean algorithm for polynomials, enabling us to find their gcd. 

It then follows, much as in Section 1.2, that in Q[x], the gcd of a(x) and 
b(x) is expressible in the form a(x)m(x) + b(x)n(x) for some polynomials 
m(x) and n(x). This property in turn gives a prime divisor property and 


2 Notice that, to maintain that degree(r) < degree(b) when b(x) is a nonzero constant and 
r(x) = 0, we need the degree of 0 to be less than 0. The usual convention is that 
degree(0) = —oo. 


https://doi.org/10.1017/97810090041 38.003 Published online by Cambridge University Press 


1.6 Rings 17 


unique prime factorization in Q[x] by the argument of Section 1.3. In this case, 
factorization is unique up to nonzero rational multiples of the factors. 


Notation and terminology. For any ring R we let R[x] denote the ring of 
polynomials with coefficients in R and “variable” or “indeterminate” x. We 
also let R[a] denote the set of values obtained by substituting a value a for x. 


For example, Z[x] is the ring of polynomials with integer coefficients 
and Z[V2] is the set of values obtained by substituting /2 for x in these 
polynomials. These values are precisely the numbers a + b V2 where a,b € Z. 


1.6.1 Euclidean Domains 


The Euclidean algorithm is not the only route to unique prime factorization, 
but it is common enough to be worth identifying the rings that possess such 
an algorithm. Guided by the two examples we have seen so far — Z and 
Q[x] -— we capture such rings in the following definition, which says that a 
division property holds. It is also usual to assume that the ring contains no 
zero divisors; that is, nonzero elements whose product is zero. 


Definition. A Euclidean domain R is a ring with no zero divisors and with a 
degree function d: R — {0} — N such that, for all a,b € R with b # 0, there 
are g,r € R such that 


a=qb-+r, witheitherr =Oord(r) < d(b). 


In the case of the polynomial ring Q[x], the degree function is of course the 
polynomial degree. For n € Z — {0}, we can take d(n) = |n|. 

In a Euclidean domain we can implement the Euclidean algorithm by 
repeated division with remainder. Given a,b € R with b # 0 we take q,r so 


a=qb+r, withd(r) < d(b), 


and let aj = b, b} = r. Then any common divisor of a,b divides r and hence 
also a;,b;, and we have d(bj) = d(r) < d(b). We repeat the process with 
a,,b,, similarly obtaining 


a, = q\b) + by with d(by) < d(by) and gcd(a,b1) = gcd(a,b). 


Then setting az = b, gives az, b2 with the same common divisors as a,b; — 
so gcd(a2,b2) = gcd(a1,b,) = gcd(a,b) — and so on. In general we get 


a= gibi + bi41 with d(bj+1) < d(bj) and ged(a;,b;) = ged(a,b). 


Since the d(b;) are decreasing natural numbers, the process ends necessarily 
when b,, divides a, exactly, and hence by, = gcd(dy, by) = ged(a,b). 
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Thus, we have a Euclidean algorithm for gcd, and its consequences follow 
as they did for the Euclidean algorithm in Z or Q[x]. In particular, we have the 
prime divisor property — if prime p divides ab, then p divides a or p divides 
b — with the consequence that prime factorization is unique up to divisors of 1. 


Exercises 


The ring axioms above are chosen to be economical, so they do not explicitly 
include some familiar properties. To see some properties that are implicit in 
the ring axioms, prove the following: 


1. Uniqueness of additive inverse: If a + b = 0, then b = —a. Prove this by 
adding —a to both sides and applying commutative and associative laws. 
Then prove in succession 


a-(-l)=-a, a-0=0, (1)-CDH= lL. 


Rings of polynomials will be particularly important in later chapters, so we 
take this opportunity to practice working with them. 


2. Let R[x, ...,X,] denote the polynomials in x;, ...,x, and coefficients in 
aring R. Use induction on n to prove that R[x1,...,x,] is a ring. 
3. Use the Euclidean algorithm for polynomials in Q[x] to prove that 


gcd(x? — 2,x? — 1) = ged(x? — 1,x — 2) = ged(x — 2,3) = 1. 


4. Also, by observing the combinations of x* — 2 and x* — 1 computed by 
the algorithm, find polynomials m(x) and n(x) such that 


(x3 — 2)m(x) + (x? — 1)n(x) = 1. 
(We revisit this example in the exercises to Section 4.4.) 


The ring Z[ V2] = {r +sJ23r,s € Z} is a Euclidean domain with degree 
function d (r +s V2) = |r? — 2s|. To prove this, first prove that d extends to 
numbers r + s /2 for rational r,s and is multiplicative. 


5. Use the multiplicative property of conjugation from Section 1.5 to show 
d((r +5 V2)(t+uv2)) =d(r+svV2)d(t +u V2) 
and observe that the proof holds for rational r,s, t, uv. 


Now suppose a,b € Z[V2] with b # 0. We seek a “quotient” g € Z[V2] 
with d(a — qb) < d(b) or, by the multiplicative property of d, d (¢ a q) < 
d(i)=1. 
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6. Givena =r+sJ/2andb =t+uV2in Z[V2], use the conjugate of b to 
show that . =v+w-V?2 for rational v, Ww. 

7. Choosing integers x, y so that |v — x| < 5 and |w — y| < 5. show that 
gq =x +y v2 satisfies d (¢ - q) < 1, as required. 


It follows that unique prime factorization holds in Z[V2], up to divisors of 1. 
However, as we saw above, there are infinitely many divisors of 1 in Z[ V2]. 


1.7 Fields 


A field is a ring with the additional property that every nonzero element a has 
a multiplicative inverse; that is, an element a~! such that aa~! = 1. Thus, 
the field axioms are: 


a+b=b+a ab = ba (commutative laws) 
a+(b+c)=(a+b)+c a(bc) = (ab)c (associative laws) 
a+0=a a-l=a (identity laws) 
a+(-a)=0 aa~' = 1 fora #0 (inverse laws) 
a(b+c) =ab+ac (distributive law) 


The multiplicative inverse* allows us to divide by any nonzero a; namely, by 
defining the quotient b/a or b + a as ba. 

In a field we can add, subtract, multiply, and divide using the usual rules for 
calculation with numbers. The obvious example of a field is the system Q of 
rational numbers, but there are many other examples, both larger and smaller 
than Q. Particularly important examples are the finite fields F,, for each prime 
number p. The elements of F, are the congruence classes [0],[1],[2],..., 
[p — 1], where 


[a] ={a+np:neZ} 
and the sum and product operations are defined by 
[a]+ [b]=[a+b] and [a] -[b] = [ab]. 


The set [a] is called the congruence class of a because its members are the 
integers a’ congruent to a modulo p; that is, such that p divides a—a’. Another 
way to put it is that a and a’ leave the same remainder on division by p. We 
also write the congruence relation 


3 We can say the multiplicative inverse because it is easily proved to be unique. Namely, if b is 
such that ab = 1, then we can prove that b = a7! by multiplying both sides of the equation 
ab = | by a! and applying the commutative and associative laws. 
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/ 


a =a _ (mod p). 
The + and - operations are well-defined on congruence classes; that is, 
[a’] = [a] and [b']=[b] = [a +0] = [a+] and [a’b'] = [ab]. 


This is easily checked for sums. For products we suppose that a’ = a + mp, 
b' = b+np and calculate: 


[a’ - b’] = [(a + mp)(b + np) = lab + (mb + na + mnp)p] = [ab]. 


These calculations show that F, is a ring for any integer p. The ring 
properties are “inherited” from those of Z. For example, a +b = b+ a in 
Z, SO 


[a] + [b] = [a + b] = [b+ a] = [6] + [a]. 


The crucial property that makes F,, a field — multiplicative inverse — is due to 
the primality of p. A multiplicative inverse of [a] is a [b] such that [a][b] = [1]. 
Since [a][b] = [ab] and [1] = {1 +p: n € Z}, we seek a Db such that 


ab=1+np_ forsomen € Z. 


Equivalently, we seek b such that 1 = ab — np. This reminds us of something 
we saw in Section 1.3; 1 = gcd(a, p) = ma + np for some m,n € Z. Thus we 
only have to rename the integer n as —n, and we can take b = m. 

A famous theorem of number theory can be reinterpreted in Fy (the original 
interpretation may be found in section 1.9): 


Fermat’s little theorem. /f[a] € Fp is nonzero, then 
[a]?! = (1). 


Proof. Consider the nonzero elements [1],[2],...,[p — 1] of F,. For any 
nonzero [a] the elements [a][1],[a][2],...,[a][p — 1] are nonzero. And 
they are distinct, because we can recover [1],[2],...,[p — 1] from 
[{a][1], [a][2], ...,[a][p — 1] by multiplying the latter elements by the inverse 
[b] of [a]. 

Thus [a][1],[a][2], ...,[a][p — 1] are the same elements as [1],[2],..., 
[p — 1], except possibly in a different order. At any rate, both sequences have 
the same product; 


(1) - [2]---(p — 1) = [a][1)- [a][2]---fallp — 1) 
= [a]?"[1] - (2]---[p — 11. 


So if we cancel [1], [2], ...,[ — 1] from both sides by multiplying by their 
inverses, we get [1] = [a]?~!, or [a]?! = [1], as required. Oo 
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1.7.1 Finite Rings That Are Not Fields 


The idea of congruence is not restricted to congruence mod p, where p is 
prime. More generally, one has congruence modulo n (or mod n for short) for 
any nonzero integer n, usually taken to be positive. The symbolism 


a =a _ (modn) 


means that n divides a — a’, and the numbers congruent to integer a, mod n, 
form the congruence class of a, mod n, also written [a]. By the same 
arguments as above, one finds that sum and product are well defined on 
congruence classes, and that the congruence classes form a ring under these 
operations. 

However, if n is not prime, the congruence classes mod n do not form a 
field, because not every nonzero class [a] has an inverse. A simple example 
consists of the congruence classes mod 6: [0], [1], [2], [3], [4], [5]. The classes 
[2], [3] are not equal to [0], yet 


[2] - [3] = [6] = [0]. 


Because of this, [2] can have no inverse [a], otherwise multiplying both sides 
by [a] would give [3] = [a][0] = [0], which is false. Similarly, [3] can have 
no inverse. 

In the general case where n is not prime, say n = /m, one similarly finds that 
[2], [2] # [0] but [/][7] = [0], so neither [/] nor [77] can have an inverse among 
the congruence classes mod n. As mentioned in Section 1.6, nonzero elements 
whose product is zero, such as [/] and [m] here, are called zero divisors. We 
will see in Section 4.2 that zero divisors are the only things that prevent a ring 
being extended to a field, the way we extend the ring Z to the field Q. 


Exercises 


A ring may have no zero divisors without being a field: Z, for example. But a 
finite ring with no zero divisors is a field, as can be seen by an argument rather 
like the proof of Fermat’s little theorem. (We revisit this proof in Section 9.4.) 


1. Suppose that R is a ring with nonzero elements a1, a2, ...,dy. If a is any 
nonzero element of R, then aa},aa2, ...,ady, are nonzero. Why? 

2. Prove also that aa; = aa; implies a; = aj. 

3. Deduce that aaj,aa2,...,ad, include the element 1, and hence that a has 
a multiplicative inverse. 
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1.8 Factors of Polynomials 


The proof of the division property of polynomials in Q[x], in Section 1.6, 
applies to polynomials with coefficients in any field F, since it uses only 
the ordinary rules for addition, subtraction, multiplication, and division. With 
this remark we can extend a classical theorem about roots and factors, due 
to (Descartes, 1637, p. 159), from polynomials with numerical coefficients to 
polynomials with coefficients in any field F. We denote this ring of polynomials 
by F[x]. 


Factor theorem. /f f(x) € F[x] and f(a) = 0 for some a € F, then f (x) has 

a factor x — a; that is, f (x) = g(x)(x — a) for some g(x) € F[x]. 

Proof. According to the division property, division of f(x) by x — a gives 
f(x) = gx) — a) +r), 


where r(x) is a polynomial of degree less than that of x —a, so r(x) is constant. 
Also, substituting a for x in this equation gives 


f(a) = g(a) -0+r(@) =r@, 


hence the constant polynomial r(x) = 0, since f(a) = 0. 
Thus, f(x) = g(x)(x — a). Oo 


Corollary. A polynomial f (x) of degree n in F[x] has at most n roots in F. 


Proof. Each distinct root a corresponds to a distinct factor x —a in f (x). Since 
f (x) has degree n, f(x) at most n such factors, hence at mostn rootsinF. oO 


This corollary, which is often attributed to Lagrange, simplifies the proof of 
another theorem of Fermat that we will see in Section 2.8. 


1.8.1 Factor Theorem for Polynomials over a Ring 


The factor theorem above is a neat application of division with remainder, but 
a stronger factor theorem can be obtained with only multiplication. 


Factor theorem over a ring. /f R is a ring, f(x) € R[x], and f(a) = 0 for 
some a € R, then f(x) = g(x)(x — a) for some g(x) € R[x]. 


Proof. First observe that 
(x —a) Ce 4x"—2q doves 4+ xa"? ae ant) 
= (x" + x™—lg ae i") — (x™ la a eee xq! 4 a") 


=x" — ql”, 
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This calculation uses only the distributive property and basic properties of 
addition, so it holds in any ring. 
Now suppose f(x) = a,x" + +--+ a,x + do, where ao, ...,d, € R. Then 


f(x) — f@ = ay(x”" — a") +---+a,(x — a), 


where x — a is a factor of each term on the right-hand side, by the calculation 
above. Therefore, 


f(x) — f(@ = g@)(x—a), where g(x) € R[x], 


because all the coefficients of g(x) arise from a,do,...,d, by addition, 
subtraction, and multiplication, and hence they belong to R. 

Finally, if f(a) = 0, we are left with f(x) = g(x)(x* — a). oO 
Corollary. [f f (x1, ...,%n) € REI, .--,Xn] and f = 0 when x; = a for some 
aeéR, then f(x1,...,%n) = g(X1,---,Xn) (Xj — a), for some g(X1,...,Xn) € 
R[x, ...,Xn].- 

Proof. View the multivariate polynomial f(x1,...,x,) as a single-variable 


polynomial f*(x;), with coefficients in the ring 
R* = R[x, sees Xi-1,Xi4+1,--- Xn], 


which obviously contains R. Thus, we have f*(x;) € R*[x;] with f*(a) = 0 
for some a € R*. 
It follows from the theorem above that 


f (x1, ...,%n) = f* (i) = 8" (i) Gi — a), where g*(x;) € R* [xi]. 


And, since R* = R[x, ...,Xj-1,Xi41, ---,Xn], g* (xj) is in fact a multivariate 
polynomial g(x1,...,Xn) € R[x1,...,Xn], as claimed. Oo 
Exercises 


The high school algebra involved in the factor theorem over a ring has several 
interesting spinoffs. 


1. Use the Corollary above to find three factors of 


f ,y,2) = xy? + ye? + 2x? — xy — y?z — 27x, 
and hence factorize this polynomial. 
2. Find g(x) such that x7 + a? = g(x)(x +a). 
3. Show, by finding a variant of the factorization of x” — a”, that x” + a™ 
has a factor x + a when m is odd. What is g(x) in this case? 
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These factorizations of polynomials influence the factorization of integers. 
In particular, they prevent certain numbers of the form 2” + 1 from being 
primes. 


4. Explain why 2” + 1 is not prime when m is odd. 
5. Show in fact that 2” + 1 is prime only when m is a power of 2. 


The only known primes of the form 22" + 1 are those with h = 0,1,2,3,4. 
They are called Fermat primes because Fermat mistakenly conjectured that 
all numbers of the form 2?" + 1 are prime. Euler found that this is not true for 
h=5. 


6. Check that 641 divides 2 +1. 


1.9 Discussion 


For an excellent general history of algebra, from ancient times until the 20th 
century, see Katz and Parshall (2014). Their unifying theme is the solution of 
equations — “taming the unknown” — with generally no restriction on what type 
of solution is allowed. In this book we explore what happens when only integer 
solutions are sought, and more detailed historical information on this particular 
theme may be found in the book of Bashmakova and Smirnova (2000) and 
the chapter on algebra (also coauthored by Bashmakova) in Kolmogorov and 
Yushkevich (2001). Another useful history, which appeared after most of this 
book was written, is Gray (2018). Gray’s book is particularly useful for its 
study of the theory of algebraic integers in the 19th century. 


1.9.1 Arithmetic from Euclid to Fermat 


The arithmetic thread of algebra, like many threads in mathematics, can be 
traced back to Euclid’s Elements. The Elements is best known for its treatment 
of geometry, but its treatment of arithmetic is equally important. As we have 
seen, the Elements is the first known source of the Euclidean algorithm, the 
proof that there are infinitely many primes, the prime divisor property (and, 
implicitly, unique prime factorization), and the irrationality proof for /2. 
Not so well known is Euclid’s use of induction, in the form known as 
“infinite descent” or the well-ordering of N. For example, in proving the 
existence of prime factorization, Euclid argues, as we did in Section 1.1, that 
natural numbers cannot decrease forever (Elements, Book VII, Proposition 31). 
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He implicitly does the same in assuming (Book VII, Proposition 1) that the 
Euclidean algorithm will terminate. For a long time mathematicians used 
induction for special tasks like this, often unconsciously, without realizing that 
induction is a foundation for virtually everything nontrivial in arithmetic. The 
first to realize this was Grassmann (1861), who gave inductive proofs of all 
the ring properties of N. In the 1880s Peano and Dedekind used the idea to the 
hilt by making induction the fundamental axiom of arithmetic in the so-called 
Peano axioms. 

The Euclidean algorithm also was not recognized as a fundamental part 
of arithmetic until the 19th century. This was partly because algorithms in 
general were not recognized as a special category of processes (indeed, this did 
not happen until the 20th century), but also because the Euclidean algorithm 
in particular could be subsumed under the topic of continued fractions. At 
least, this was the situation in Europe. The Euclidean algorithm had long been 
used in India, for purposes such as solving linear and quadratic Diophantine 
equations, under the name of the “pulveriser.’ For more on this story, see 
Weil (1984). In Europe, continued fractions were a topic that also encom- 
passed quadratic Diophantine equations, especially the so-called Pell equation 
x? — Ny? = 1 for nonsquare positive integer values of N. 

After discovering that /2 is irrational, Greek mathematicians studied the 
particular Pell equation x7 — 2y* = 1, in order to approximate and better 
understand the quantity /2. They discovered the solutions x = x, and y = yp 
we found in Section 1.5 and called them “side and diagonal numbers” because 
the ratio x, / yp, approaches the ratio of the side to the diagonal of a square. (And 
perhaps also because they took x, y, as diagonal and side of an “approximate 
square,” from which they constructed a better approximation x)41, Yy+1). 
Today we would relate their process to the continued fraction for /2. 

The Indian mathematicians Brahmagupta (around 600 cE) and Bhaskara 
II (around 1150 ce) found methods for solving x* — Ny* = 1 for positive 
integers N, which, in the opinion of Weil (1984), came close to a general 
solution. Bhaskara II was able to solve the particularly hard case N = 61, 
whose smallest positive solution is x = 1766319049, y = 226153980. Fermat 
(1657) independently noted this case (great minds think alike?), so he probably 
had a general method for solving x* — Ny” = 1 as well. However, a general 
solution was first published by Lagrange (1768), using continued fractions. 
The key step is to show that the continued fraction for N is ultimately 
periodic for any positive integer NV. This property can be seen in the continued 
fraction for ./2 in exercise 5 of Section 1.4, where all quotients after the first 
are equal to 2. 


https://doi.org/10.1017/97810090041 38.003 Published online by Cambridge University Press 


26 1 Euclidean Arithmetic 


Figure 1.2 Niccolo Fontana (1499-1557) and Gerolamo Cardano (1501-1576) 
(both public domain). 


1.9.2 Algebra as Symbolic Calculation 


The first step towards algebra as an independent discipline was the use of sym- 
bols to denote numbers, and extension of addition, multiplication, and so on 
from numbers to symbols. Rudimentary symbolism was used by Diophantus 
and later by al-Khwarizmi, whose work gave us the name “algebra” from the 
Arabic word “al-jabr” meaning something like “rearrangement of parts.” (For 
a long time the word “algebra” was also used in Europe to mean resetting of 
broken bones.) 

Algebra took off in Europe after Italian mathematicians in the early 16th 
century discovered the solution of cubic equations. Specifically, 


v= px+q_ has solution 
HH-O +8 1O-@ 


Essentially, this solution was discovered by Scipione del Ferro soon after 1500. 
It was rediscovered by Niccolo Fontana (known as Tartaglia) in the 1530s, and 
published by Cardano (1545). 

Until this time, algebraists interpreted algebra geometrically — for example, 
“completing the square” meant exactly that — and they used geometric proofs 
in the manner of Euclid. But the solution of cubic equations went beyond 
anything done by the Greeks, or anyone else, and it galvanized the European 
mathematicians. Cardano wrote: 
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Figure 1.3 Francois Viéte (1540-1603) and René Descartes (1596-1650). 
Bettman via Getty Images and via Getty Images, respectively. 


In our own days Scipione del Ferro of Bologna has solved the case of the cube and 
first power equal to a constant, a very elegant and admirable accomplishment. 
Since this art surpasses all human subtlety and the perspicuity of mortal talent and 
is a truly celestial gift and a very clear test of the capacity of men’s minds, whoever 
applies himself to it will believe that there is nothing he cannot understand. 


By the end of the 16th century, European mathematicians were confidently 
doing algebra by symbolic calculations alone, and indeed with much the same 
notation used in high school algebra today. This notation was introduced by 
Viéte (1591), and Descartes (1637) used it to actually reverse the positions 
of algebra and geometry, using algebra to prove results in geometry. This 
algebraic geometry of Descartes paved the way for calculus and differential 
geometry, in which the concept of “calculation” is extended to certain infinite 
processes, such as infinite sums. This extension goes beyond what is normally 
considered algebra, so we will not pursue it further in this book, but it can be 
viewed historically as a natural outcome of the algebra of Viéte and Descartes. 

Indeed, we will not even pursue the operations of square root and cube 
root that were of such interest to the early algebraists. Although we have used 
numbers such as 4/2. and will continue to do so, we will not need to invoke any 
infinite process — such as the continued fraction — in order to use J2 effectively 
in algebra. We will see that to “know” ./2 algebraically, it is enough to know 
the concepts of sum, product, and congruence for polynomials. And these 
concepts are virtually the same as the corresponding concepts for integers, 
which we studied in Section 1.7. 
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1.9.3 The Binomial Theorem 


One of the first important theorems of algebra is the binomial theorem: 


(n 


-—1 
(a +b)" =a" +na"!b+ Mat ot mad! + BM, 


where the coefficient of a”~*b" is the so-called binomial coefficient 


n _ ae Tie aha 1) n! 
( k(k—1)+--2-1  (n—K)IKV 


also known as “n choose “k.” 
The simplest examples are easily calculated: 


(a+b)! =a+tb, 

(a+ b)* =a? +2ab+ 0b’, 

(a+b) =a? + 3a*b + 3ab? +b? 

(a + b)* =a‘ + 4a°b + 6a7b? + 4ab? + b+. 


And the coefficients of any (a +b)” form the nth row of the so-called Pascal’s 
triangle, 


1 3 3 1 
146 4 1 
15 10 10 5 1 


each member of which, except the | at the end of each row, is the sum of the 
two above it. 

“Pascal’s triangle” was not discovered by Pascal. Examples of it may be 
seen in Chinese mathematics books centuries before Pascal. The Chinese used 
binomial coefficients to calculate numerical solutions of polynomial equations. 
Their process was later rediscovered under the name of “Horner’s method” in 
Europe. However, Pascal (1654) was the first to systematically prove properties 
of the binomial coefficients, and to do so he used induction in the form 
commonly used today: with a “base step” and an “induction step.” 

For example, to prove the defining property of Pascal’s triangle, one first 
observes that it is true for the second row 1 2 1 representing (a + b)*; the 
2 is indeed the sum of the two Is above it. This is the base step. 
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Figure 1.4 Blaise Pascal (1623-1662). Licensed under Creative Commons Attri- 
bution 3.0 Unported License. 


The induction step is to show that the coefficients in the (n + 1)st row are 
sums of those above them in the nth row. Well, the (7 + 1)st row consists of 
the coefficients of (a + b)"*!, and 


(a+b)"*! =a(a+b)" +b(a+b)". 


From the latter equation, we see that 


1 
(" ) = coefficient of a*b"t!—* in (a+ b)"t! 


tI-k in a(a +b)" 


+ coefficient of akb"+!—* in b(a + bd)” 
= coefficient of a*—!b"+!~* in (a + b)” 


+ coefficient of akb”~* in (a + b)” 


= coefficient of a*b™ 


which is the sum of the appropriate coefficients from the row above. 

Fermat’s “little theorem,” proved in Section 1.7, was apparently discovered 
through a connection with properties of the binomial coefficients. Details may 
be found in Weil (1984), but the essential idea can be seen by looking at 


-1 
ee 


5 PF 4 et pel +1. 


FP = (4 DP 1? + p.1? 
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The binomial coefficients on the right-hand side are necessarily integers, so 
the denominator of each coefficient () divides the numerator. It is clear 
from the formula for (1) above that the numerator of (2) contains the factor 
p but the denominator does not. Therefore, by unique prime factorization, 
p divides each term on the right-hand side, except the first and last.4 It 


follows that 
2? =2 (mod p). 


Dividing each side by 2, which is valid when p > 2 because 2 has an inverse 
mod p, we find the special case of Fermat’s little theorem: 


2?-! = | (mod p). 
Fermat in fact proved his little theorem in the form 
a? =a (mod p), 


which is valid for an arbitrary integer a. 


1.9.4 Rings and Fields 


Like any abstraction, the concept of field emerged only after several concrete 
instances of it had been observed: in this case, the field Q of rational numbers, 
and the algebraic number fields obtained from Q by throwing in roots of a 
polynomial in Z[x] and forming all possible sums, differences, products, and 
quotients. This was done by Galois in the late 1820s, as part of his theory 
of equations, which distinguishes the polynomial equations that have solutions 
“by radicals” (such as equations of degree 2 and 3) from those that do not (such 
as certain equations of degree 5). Remarkably, Galois also discovered the finite 
fields F,,. At the time, fields were known as “domains of rationality” because 
they admit all the rational operations: addition, subtraction, multiplication, and 
division (by nonzero elements). 

In the 1870s Dedekind singled out the algebraic number fields for special 
attention and gave them the name Korper. This is the German word for body, 
which Dedekind thought appropriate because closure under rational operations 
makes these domains in some sense complete and self-contained. Later, the 
word K6rper became not body but field in English, though the ghost of Kérper 
lingers in the letter K often used to denote a field in English-language algebra 
books. 


4 This can be observed in rows 2, 3, and 5 of the small Pascal’s triangle shown above. It is fun to 
continue the triangle as far as row 7, or row 11, to see how the phenomenon plays out for other 
prime-numbered rows. 
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Figure 1.5 Richard Dedekind (1831-1916) and Leopold Kronecker (1823-1891). 
Licensed under Creative Commons Attribution-ShareAlike 4.0 International 
License and courtesy of ETH-Ziirich Bibliotek, respectively. 


At about the same time Kronecker introduced the “finitary” approach to 
algebraic number fields, using congruence classes of polynomials in place of 
explicit irrational numbers. This approach, as we will see in Sections 4.3 and 
4.4, follows the construction of finite fields via congruence of integers mod p. 
The only difference is that the Euclidean algorithm for integers is replaced 
by the Euclidean algorithm for polynomials. In Kronecker’s construction, 
the prime number p is replaced by a polynomial p(x) that is “prime” (or 
irreducible) in the sense of having no factorization into polynomials of smaller 
degree. 

The concept of ring emerged more slowly than the concept of field, from 
the concept of algebraic integer. Particular algebraic integers were used by 
Euler, who assumed that they behaved “like ordinary integers” to the extent of 
having unique prime factorization, as we will see in Section 2.9. But defining 
the general concept required some care: to obtain enough closure properties 
(under addition, subtraction, and multiplication) to ensure algebraic integers 
behave like ordinary integers, but not so many as to let them behave like 
rational numbers. Eventually, abstracting examples due to Dirichlet, Kummer, 
and Eisenstein, Dedekind came up with the appropriate definition in the 1870s. 
In the process, he shaped the future concept of ring. 

The trick in defining algebraic integers is to take roots of polynomials 
in Z[x] (which define algebraic numbers), but to restrict the polynomials to 
those with leading coefficient 1: the so-called monic polynomials. Eisenstein 
(1850) had already proved that sum, difference, and product of such numbers 
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is another number of the same type. So the collection of all algebraic integers 
is certainly a ring. However, Dedekind saw that a further restriction is needed 
to ensure anything like prime factorization: The algebraic integers should be 
confined to those of an algebraic number field. 

Thus, instead of working with the ring of all algebraic integers, one 
normally chooses an algebraic number field F and works with the ring of 
algebraic integers in F. We will see how this works out in Section 4.6. The 
term ring itself came later, as a shortening of the term number ring (in German, 
Zahlring) used by Hilbert (1897). The general theory of rings begins with 
Emmy Noether (1921), whose work will be discussed further in Chapters 5 
and 9. 
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Diophantine Arithmetic 


Preview 


An equation is called Diophantine if it is a polynomial equation, with integer 
coefficients, for which integer solutions are sought. (Thus it is not really the 
equation that is “Diophantine,’ but the solutions.) Famous examples have 
already been mentioned in Sections 1.2 and 1.5. The name comes from 
Diophantus, who investigated such equations around 200 cE. He was actually 
more interested in rational solutions — which are usually easier to find — 
and it was Fermat who directed attention to integer solutions. Fermat was 
nevertheless greatly inspired by Diophantus, who made a few remarks on 
properties of integers, which Fermat realized were well worth pursuing. 

They include properties of sums of two and four squares, and the spe- 
cial equation y> = x* + 2, whose solution x = 5, y = 3 was mentioned 
by Diophantus. Fermat saw that the key to understanding sums of two squares 
was finding the primes that are sums of two squares, which he claimed are 
those of the form 4” + 1. He also made many other claims, among them that 
x = 5, y = 3 is the only positive solution of y? = x? + 2, and the famous 
Fermat’s last theorem, stating that the equation x” + y” = z” has positive 
integer solutions only whenn < 2. 

These claims of Fermat inspired Euler, who managed to prove some of them 
in the next century. Among the tools Euler introduced were algebraic integers. 
These proved to be very fruitful for the future of number theory, and also for 
algebra. 


33 
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2.1 Rational versus Integer Solutions 


Perhaps the first, and certainly the most famous, Diophantine equation is 
a+b? =’, 


relating the sides a,b, and hypotenuse c of a right-angled triangle. Equally 
famous are some integer solutions of this equation, such as (a,b,c) = (3,4,5) 
and (a,b,c) = (5,12, 13), called Pythagorean triples. Finding integer triples 
(a,b,c) such that a? + b* = c? is essentially equivalent to finding rational 
solutions x = a/c and y = b/c of the equation: 


x +y=1. 


Geometrically speaking, we seek rational points on the unit circle, and there 
is in fact a geometrical method of finding these points. (It was discovered by 
Diophantus, though he did not describe it geometrically.) 

First we notice certain obvious rational points, such as P = (—1,0). If we 
connect P to another rational point Q, then the line P Q has rational slope. Not 
so obviously, the converse is true: if we draw a line through P with rational 
slope t (Figure 2.1), it meets the circle at another rational point Q. 


>< 


Figure 2.1 Finding the rational points on the unit circle. 
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There is an algebraic reason why Q must be rational, but we need to find 
its specific coordinates in terms of ¢ in order to find a formula for Pythagorean 
triples. The reasoning and the calculation go as follows. 

The equation of the line PQ is y = t(x + 1), so we find the x-coordinates 
of its intersections with the circle by substituting y = t(x + 1) inx* + y? = 1. 
The result is the quadratic equation 


(ee eS SO 


or, equivalently, 


ee ear 
1+?7 +1 
Since x = —1 is a solution of this equation (corresponding to the point P), the 
left side has a factor (x + 1). The other factor is therefore (« + ot), which 
corresponds to the solution 
1-77 
x= Ta 2 


This is the x-coordinate of the point Q, and hence its y-coordinate is 


ee di Py a 2t 
— xX = = — >- 
- 1+ 72 1+? 


If we now substitute an arbitrary fraction q/p for t, we find 


This means that the original Pythagorean triple (a,b,c) is of the form 


a=(p’-q)r, b=2pqr, c=(p?+¢q°)r 


for some natural numbers p,q,r, 


a result originally found by Euclid. 
A very similar argument enables us to find the rational points on the 
hyperbola x7 — 2y” = 1. In fact, they are 


1+ 29 2t 
is a as os 
But these are of little help in finding integer points. We can find by trial 
that t = 1/2 gives the smallest positive integer point, (x,y) = (3,2), and 
that t = 2/3 gives the second positive integer point (x, y) = (17, 12). But there 
is no telling whether there are infinitely many integer points, or how to describe 
them. For this, as we saw in Section 1.5, algebraic integers are the key. 


xX 
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Exercises 


1. Show that the line of rational slope ¢ through the point (— 1,0) on the 
hyperbola x? — 2y* = 1 has a second intersection with the hyperbola at 


2. Similarly find all rational points on the ellipse x? + 3y? = 1. 


It should now be clear that this method will find all the rational points on a 
quadratic curve, as long as the curve has at least one rational point. But some 
quadratic curves have no rational points. An example is the circle x* + y* = 3. 


3. Show that there are rational points on x* + y? = 3 only if there are 
integers a,b,c such that a* +b? = 3c? and gcd(a,b,c) = 1. 

4. Show that any square is congruent to 1 or 0 mod 4. 

5. Deduce that a? + b? = 3c? has no integer solution. 


A direct approach to Euclid’s solution of a? + b* = c? is possible with the 
help of unique prime factorization. The main idea is to seek primitive 
Pythagorean triples (a,b,c), which are those for which the gcd of a,b,c is 1. 


6. By proving each square congruent to either 0 or 1, mod 4, show that 
exactly one of a or b is odd, and c is odd, in a primitive Pythagorean triple 
(a,b,c). 

7. Assuming that b is the even member of the triple, use the equation 


by? esa, ea 
(3) a ee. 
and unique prime factorization to conclude that — = p* and oe =q’ 
for some integers p,q with gcd(p,q) = 1. 
8. Solving for a,b,c, conclude that a primitive Pythagorean triple (a, b,c) has 
the form a = p _ q’, b=2pq,c= p + q’, where gcd(p,q) = 1. 


2.2 Fermat’s Last Theorem for Fourth Powers 


Pythagorean triples are exceptional, according to the famous “last theorem” of 
Fermat. This theorem states that, for any integer n > 2, there are no positive 
integer triples (a,b,c) with a” + b” = c”. Although claimed by Fermat, the 
theorem was first proved by Wiles (1995). However, Fermat did give a proof 
for n = 4, which ingeniously uses Euclid’s formula for Pythagorean triples 
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and also Euclid’s “infinite descent” form of induction. Here we give a shorter 
proof using the same general idea. 


Fermat’s last theorem for fourth powers. Ifa, b,c are positive integers, then 
a+b ect. 


Proof. Suppose, for the sake of contradiction, that a,b,c are positive integers 
with a* + b* = c+. By dividing through by any common divisors, we can 
assume that the only common divisor of a,b,c is 1, in which case the gcd 
of a2, b*,c? is also 1. Now the equation a+bt=ct says that (a2, b?, c*) isa 
Pythagorean triple, so Euclid’s formula gives 


a=p’—q, Bb =2pq, ¢%=p*+q’, with gced(p,g) =1. 


This is because the gcd of a*,b?,c? is 1 and we can assume, without loss of 
generality, that b? is the even member of the triple. 


2 


We now view a“ = x,b = y, andc = Z as integer solutions of the equation 


v+yta zt, (*) 


in which case (x, y?,z7) is also a Pythagorean triple, with the gcd of x, y*,z” 
equal to 1, and with y? even. 
Then, by Euclid’s formula again, we get positive integers r,s with 


x=re—s*, y°=2rs, 2=r* a with gcd(r,s) = 1. 
The third equation shows that (r,s, z) is yet another Pythagorean triple, whose 
members again have gcd 1, because a common prime divisor of r,s would give 
a common prime divisor of x, y 2, 

So, by Euclid’s formula yet again, we get positive integers u,v, with gcd 1, 
such that either 


r=u*—v* and s=2uv or r=2uv and s=u2—v’. 


In either case, 
Do. _ 22 : = 
y’ = 2rs = 4uv(u- — v*) ~with = ged(u,v) = 1. 
Unique prime factorization then implies that wu, v, and u? — v2 
say, 


are all squares, 


Now the last equation, n? + m* = I+, has the same form as equation (*). 
Also, by tracing back, we see that] < u < s < zor! <u <r < z. Thus, 
from any positive integer solution z of (*) we find a smaller positive integer 
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solution /. As Euclid might add, “which is impossible.” Hence, there is no 


positive integer solution of a* + b+ = ct. Oo 


Exercises 


The argument above proves incidentally that there is no positive integer 
solution of the equation n? = 1+ — m+, and, hence, also no negative integer 
solution, since all the powers are even. 


1. Deduce that there is no nonzero rational solution of y? = 1 — x*, by 
showing that a nonzero rational solution would give a nonzero integer 
solution of n? = [+ — m*. 

Jakob Bernoulli (1704) speculated that a similar argument might show there 
is no rational function solution of y? = 1 — x*+, which would explain why it 
seems impossible to “rationalize” /1 — x* by substituting a rational function 
for x, a problem he and other pioneers of calculus had met when trying to 
evaluate the integral / Jo Here is one such argument. 

2. Show a solution of y? =|- x4 by rational functions y = y(t), x = x(t) 
gives a solution of n? = /* — m* by polynomials / = I(t), m = m(t), 
n=n(t). 

3. Use unique prime factorization of polynomials, from Section 1.6, to rerun 
the argument in the previous exercise set with polynomials in place of 
integers. Hence, find a “Euclid’s formula” for polynomials a(t), b(t), c(t) 
satisfying a(t)? + b(t)? = c(t)’. 

4. Now carry over the argument for the impossibility of n? = 14 — m* in 
nonzero integers to show its impossibility in nonzero polynomials. 


2.3 Sums of Two Squares 


In Book III, Problem 19, of his Arithmetica, Diophantus wrote: 


65 is naturally divided into two squares in two ways, namely into 7 +4 and 
82 + 1, which is due to the fact that 65 is the product of 13 and 5, each of which 
numbers is the sum of two squares. 
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It is thought that Diophantus had in mind the two identities 


(a + b*)(c* +d’) = (ac — bd)” + (ad + bc)* (*) 
= (ac + bd)? + (ad — bc)’, 


which give the two expressions for 65 as a sum of two squares by setting a = 3, 
b = 2,c = 1, d = 2. The general identities were later rediscovered, and 
laboriously proved, by Fibonacci (1225). Today, it is a routine exercise in high 
school algebra to confirm the identities, but they have a deeper meaning in 
connection with complex numbers. 

If we interpret the “imaginary unit” i = ./—1 as a unit vector perpendicular 
to the real number line, then each number of the form a+ bi, where a and b are 
real, represents a point in the plane. The point a+ bi lies at distance Va? + b2 
from the origin, by the Pythagorean theorem, and we call this distance the 
absolute value |a + bi| of a + bi. It is more convenient for number theory to 
use the square of the absolute value, |a + bil? = a* + b*, which we call the 
norm. 

Thus, sums of two squares may be viewed as norms of complex numbers, 
and the Diophantus identity (*) says that the norm is multiplicative. 


Multiplicative property of the norm. For any complex numbers a + ib and 
c+id 
l(a + bi)(c + di) > = la + bi? |c + dil’. 
Proof. We expand the product (a + bi)(c + di), using i* = —1, and apply the 
definition of norm as the square of absolute value: 
|(a + bi)(c + di)? 

= |ac — bd + (ad + be)i|* 

= (ac — bd)” + (ad + bc)” by definition of absolute value 

= (a’ +b*)(c? +d) _ by the Diophantus identity (*) 


=|a+bi "Ic +di i; by definition of absolute value. Oo 


The norm |a + bi|? is convenient for number theory because it is a natural 
number for natural numbers a and b, though |a + ib| has greater geometric 
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meaning. The absolute value acquires a multiplicative property from that of 
the norm: 


\(a + bi)(c + di)| = |a + billc + dil. 


This says that multiplying the set of all complex numbers a+ bi by a 
fixed complex number c + di multiplies all distances by |c + di|, and hence 
preserves shapes. (This idea is worked out more fully in the exercises below.) 

Coming back to number theory, the complex numbers allow us to factorize 
each sum of squares 


a’ +b? = (a — bi)(a + bi), 


which draws our attention to numbers of the form a + bi where a and b are 
ordinary integers. These complex numbers, like the irrational real numbers 
a + b¥2 studied in Section 1.5, are “algebraic integers” in a sense we will 
define in Section 4.6. They are called the Gaussian integers, and they form 
a ring called Z[i]. As in Section 1.5, we will find that these algebraic integers 
have properties, such as conjugates and norm, that reveal hidden properties of 
the ordinary integers. Not surprisingly, the Gaussian integers are particularly 
good at revealing properties of sums of squares. 


Exercises 


Although this is a book about algebra, sometimes algebra has a geometric 
interpretation that is too useful to ignore. We have already seen an example 
in the previous section. Another is the multiplicative property of the norm, 
which has a geometric consequence about preservation of shape that will be 
important in Section 2.5. 


1. Observing that the distance between two points v and w in the plane of 
complex numbers is |v — w|, show that multiplication by u sends v and w 
to points whose distance apart is |u||v — w|. 

2. Deduce that multiplication of all complex numbers by wu is a map of the 
plane that multiplies all distances by |u|, and hence preserves shape. 

3. Show in particular that if |u| = 1, sou = cos@ +i sin@, then 
multiplication by u is a rotation about the origin O through angle 6. 

4. Show, more generally, that multiplication by a general complex number 
u =r(cos@ + i sin@) is a combination of rotation about O through angle 
6 and magnification by r. 

5. Show that the unit circle {z € C: |z| = 1} is divided into n equal parts by 
the n powers of ¢, = cos on + isin am 


i ne: = : _— -ltiv3 
6. By solving the equation z° — 1 = 0, or otherwise, show that ¢3 = —*;~~. 
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2.4 Gaussian Integers and Primes 


The Gaussian integers a — bi and a + bi are called conjugates of each other, 
and we write c + di for the conjugate of c+ di for any ordinary integers c and 
d. The conjugation operation has the following easily checked properties. 


1. (@+bi) +(c+di) = (a+bi) + (c+di). 
2. (a+ bi)-(c+di) = (a+bi)-(c+di). 
3. (a+ bi)(at+ bi) =a? +h? = \at+ bil’. 


In particular, the second property is the multiplicative property of conjugation 
previously observed for the notion of conjugate used in Section 1.5. Notice 
also that properties 2 and 3 give another proof that the norm is multiplicative: 


l(a + bi)(c + di)|* = (a + bi)(c + di)(a + bi)(c +. di) _ by property 3 
= (a+bi)(c+di)(a+ bi)-(c+di) by property 2 
= (a+ bi)(at+ bi)(c + di)(c + di) 
=|a+bil’|c+di|? by property 3. 


There is in fact a general notion of conjugate for algebraic integers, and also 
a general notion of norm, for which analogues of all the above properties hold. 

We can also use the norm to define the concept of Gaussian prime. We 
take the norm of a Gaussian integer to be its squared absolute value. Then a 
Gaussian integer is prime if it has norm greater than | and it is not the product 
of Gaussian integers of smaller norm. This generalizes an equivalent of the 
usual definition (in terms of divisors) for ordinary primes, and it is simpler to 
state because it avoids mentioning the eight trivial divisors of each Gaussian 
integer a: namely, +1, +i, +a, and tia. 

It follows, since the norm is an ordinary integer, that we have: 


Existence of prime factorization. Any Gaussian integer of norm greater 
than I has a Gaussian prime factorization. 


Proof. Suppose that a is a Gaussian integer of norm greater than 1. If a is a 
Gaussian prime, we are done. If not, suppose a = By, where # and y are 
Gaussian integers of smaller norm greater than 1. Repeat the argument with 6. 
This produces a sequence of Gaussian divisors of @ with decreasing norms. 
Since the norms are natural numbers, this process must terminate — necessarily 
with a Gaussian prime divisor p of a. 

After finding a Gaussian prime divisor p of a by this process, we repeat 
the process with the Gaussian integer a/p. This leads to a Gaussian prime 
factorization of a. oO 
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Proving uniqueness of this prime factorization is harder than proving 
existence, as it is with ordinary integers. However, the proof depends on the 
same basic idea: the division property. Before proving uniqueness, it may be 
worth looking at some Gaussian integers that factorize, and some that do not. 
The examples show how Gaussian factorization depends on factorization of 
the norm, thanks to the multiplicative property. 


e 2= (1 —i)(. +i). The factors | — i and 1 + i are indeed Gaussian primes, 
because they each have a norm 2, which is not a product of smaller norms. 
Hence, neither 1 — 7 nor | + 7 is the product of smaller Gaussian integers. 

This example shows that an ordinary prime may not be a Gaussian 
prime. In fact, any prime that is the sum of two squares, a” + b”, splits into 
Gaussian factors a — ib and a + ib, and these factors are Gaussian primes 
because they have the (ordinary) prime norm a? + b?. 

- The ordinary prime 3 is a Gaussian prime because its norm 3° splits into 
smaller factors only as 3 - 3, and 3 is not the norm of a Gaussian integer 
because 3 # a? + b? for ordinary integers a, b. 

e The Gaussian integer 3 + i has norm 10 = 2 - 5, so any Gaussian factors of 
3 +7 must must have norms 2 = 17 + 17 and 5 = 27 + 1”. Those with 
norm 2 are the numbers +1 + i and those with norm 5 are either +2 + 7 or 
+1 + 27; all are Gaussian primes. By trial and error, we find 
3+i=(-i1)0+4+2i). 


Exercises 


1. Use the existence of Gaussian prime factorization to prove that there are 
infinitely many Gaussian primes. 


Finding the Gaussian prime factors of a Gaussian integer is not much harder 
than finding the ordinary prime factors of an ordinary integer. If the Gaussian 
integer is real, we first find its ordinary prime factors, then split those that are 
sums of two squares. 


2. Find Gaussian prime factors of 30 = 2-3-5. 

3. Show in general that Gaussian prime factors of an ordinary integer are 
ordinary primes that are not sums of two squares and conjugate pairs of 
imaginary Gaussian primes. 


If the Gaussian integer a has both real and imaginary parts, then its Gaussian 
prime factors correspond to ordinary prime factors of its norm, |q@|?. If an 
ordinary prime factor of the norm is of the form a7+”, then the corresponding 
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Gaussian prime factor of a is one of a + bi or +b + ai, so we have to test a 
few (but not too many) possibilities. 


4. Find a Gaussian prime factorization of 3 + 47, and hence show that it is the 
square of a Gaussian prime. 
5. Find a Gaussian prime factorization of 7 — 11i. 


2.5 Unique Gaussian Prime Factorization 


In Z[i] the division property is not as obvious as it is in Z, but it is easy to prove 
by viewing Z[i] as a lattice of points in the plane. Figure 2.2 shows what they 
look like. They are shown as (mainly) white dots, with a few of black or gray. 
The black dots are multiples of 3 + i and the gray dot is the number 5 + 33. 

Obviously, the points of Z[i] lie at the corners of a square lattice and, 
less obviously, so do the multiples of 3 + i. This is because they result from 
multiplying the square lattice Z[i] by 3 + i, which multiplies all distances by 
|3-++7| and hence preserves shape. Thus, the multiples of 3 +7 lie at the corners 
of a lattice of squares of side length |3 + i|. And the number 5 + 37, which lies 
in one of these squares, clearly lies at distance less than |3+i| from the nearest 
corner. 


O O O O O O @ O 
(2+1)G3 +i) 


O O O ® O O O O 
d+i)G+1%) 


e O O O O O @ O 
i(3 +i) See 


3+i 


Figure 2.2 Multiples of 3 +i near 5 + 3i. 
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More generally, the Gaussian integer multiples Bj of any Gaussian inte- 
ger B lie at the corners of a lattice of square of side length | 6|, and any Gaussian 
integer @ lies at a distance less than |f| from the nearest multiple of 6. This 
is due to the geometric fact (provable using the Pythagorean theorem) that the 
distance from a point in a square to the nearest corner is less than the side 
length. Translating this geometric fact into algebraic language, we have: 


Division property for Gaussian integers. For any Gaussian integers a and 
B # 0 there are Gaussian integers 4 and p such that 


a=But+p and |p| <\|Bl. o 


The division property now allows us to prove unique prime factorization 
for Gaussian integers, much as we did for ordinary integers. The only slight 
differences are: 


« We must use division with remainder in the Euclidean algorithm rather than 
subtraction. But this still gives the essential consequence that the gcd of 
Gaussian integers a, 6 has the form wa + v6 for some Gaussian integers 
Lt, v. This leads to the prime divisor property exactly as in Section 1.3. 

e Prime factorization is unique only up to order and factors of +1, + i, since 


Gaussian primes that divide each another may differ by the factors +1, + i. 
These are the Gaussian integers that divide 1, called units. 


Unique Gaussian prime factorization. Any Gaussian integer of norm 
greater than I has a factorization into Gaussian primes, which is unique 
up to the order of factors and unit factors. Oo 


In the next few sections we will see how properties of the Gaussian 
integers, particularly their prime factorization, reveal properties of the ordinary 
integers, particularly the sums of two squares. This happens because of the 
factorization a + b* = (a — bi)(a + bi). In effect, the Gaussian integers give 
us a “microscope” able to see properties that ordinary “integer vision” cannot. 


Exercises 


The geometry of multiplication, pointed out in Section 2.3, helps to 
answer a geometric question about rational points P on the unit circle: if 
P = (a/c,b/c), for ordinary nonzero integers a,b,c, can the angle 0 between 
the x-axis and OP be a rational multiple of x ? The question arises because, 
seemingly, rational points divide the circle into n equal parts only when 
n=2or4. 
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1. Suppose that P = (a/c,b/c) is a point such that O P makes an angle 
2mm /n with the x-axis, where n # 2,4. In other words, 


2mm 


a b. 2nnrt i, 
+ -—1 = cos +1 sin 
Cc n 
Deduce, from the geometric interpretation of multiplication, that 
a b n 
(2+2)) =1, or (a+bi)" =c". 


2. Deduce from the equation (a + bi)” = c”, and unique Gaussian prime 
factorization, that the Gaussian prime factors of (a + bi)” differ from 
those of c” by at most the unit factors +1 and +i. 

3. Deduce in turn that the Gaussian prime factors of a + bi differ from those 
of c by at most unit factors. 

4. Show, however, that a + bi and c differ by more than unit factors, so we 
have a contradiction. 


The geometric view used to prove the division property in Z[i] can be 
adapted to prove the division property, and hence unique prime factorization, in 
Z| / —2]|, where Z| / —2| denotes the ring of numbers of the form a + b./—2 
for a,b € Z. 


5. For each nonzero 6 € Z| /—2], the multiples By of B € Z[/—2]| by all 
we Z[/—2] form rectangles the same shape as those in Z| /—2]. Why? 

6. By considering the distance from any a € Z[/—2] to the nearest corner 
Bu, prove the division property for Z|/—2]. 


2.6 Factorization of Sums of Two Squares 


A sum of squares a? + b* of ordinary integers a,b equals the Gaussian integer 
product (a — bi)(a+ bi), and we now know that a —ib and a+ib have unique 
Gaussian prime factorizations. We also know, by the multiplicative property 
of conjugates, that we can take the Gaussian prime factors of a — bi to be the 
conjugates of the Gaussian prime factors of a + bi. 

Then if we pair each Gaussian prime factor a, — bxi of a — bi with the 
Gaussian prime factor ax + byi of a + bi, we get the sum of squares a + b 
which is the norm of az — bgi (and of ag + bgi). Since ax — byzi is a Gaussian 
prime, its norm is not a product of smaller norms: that is, ap + be is not a 
product of smaller sums of squares. 
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Suppose that ay =+ be is a product of ordinary primes c; that are not sums 

of squares. This is possible if one of a;z, by; equals 0, in which case 
ae + by = 

is the unique Gaussian prime factorization (up to unit factors), and hence c is 
an ordinary prime that is a Gaussian prime. But if there are two or more c;, 
then the product of the c; has a Gaussian prime factorization different from 
(ax — bi) (ax + bgt) (even up to unit factors), contrary to uniqueness. 

Thus, the factors an + be of a* + b? are ordinary primes except when one 
of ax, bg equals zero and the other equals an ordinary prime c. In this case, c is 


acommon divisor of a — bi and a+ bi, and hence of a and b. So we are led to 
the theorem: 


Factorization of a sum of two squares. Jf a and b are ordinary integers with 

gcd(a,b) = 1 then a* + b? is a product of ordinary prime sums of squares 
2 2 

a t+ be. Oo 


This theorem explains the example of Diophantus, where 65 is the sum of 
squares in two ways, but it factorizes uniquely into prime sums of squares: 
namely, 13 = 3* + 2? and 5 = 2? + 17. A corollary of the theorem is a result 
discovered by Euler (1747): 


Divisors of a sum of squares. [f a and b are ordinary integers with 
gcd(a,b) = 1, then any integer divisor of a? + b* greater than 1 is a sum of 
two squares. 


Proof. Any divisor greater than | is a product of prime divisors, which, we 
have just seen, are the sums of squares ap + be. Such products are also sums 
of two squares, by the Diophantus identity of Section 2.3. Oo 


Exercises 


Assuming unique prime factorization in Z| /—2 , which was proved in the 
previous exercise set, we can now make a similar investigation of ordinary 
integers of the form a* + 2b* and their prime factors. 


1. Check that the numbers 34, 54, and 102 are each of the form a? + 2b”, and 
that each of their prime factors are also of this form. 

2. Show the product of two numbers of the form a* + 2b? is also of this form. 

3. Explain the latter result in terms of norm and conjugates of numbers of the 


forma +b/—2. 
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4. Prove that if ged(a,b) = 1, then any integer divisor of a? + 2b* greater 
than | is of the form c* + 2d?. 


We can also factorize members of Z| / —2| into primes of Z| / —2], using 
the natural number factors of their norms. For example, | + 2./—2 has norm 
9 = 3 - 3, which leads to the factorization —( —1+ 2). 


5. Factorize the number 3 + 2 /—2 into primes of Z[V —2). 
6. Show that 5 + —2 is the cube of a prime of Z[V —2]. 


2.7 Gaussian Primes 


Which ordinary primes p are also Gaussian primes? If p is a sum of two 
squares, a*+b? with a,b # 0, then p has the factorization (a — bi)(a+ bi) into 
Gaussian integers of smaller norm, so p is not a Gaussian prime. (However, 
the factors a — bi and a + bi of p are Gaussian primes, because they have the 
ordinary prime p as norm.) 

Conversely, if p = (a+ bi)(c+ di) is a factorization into Gaussian integers 
of smaller norm, then a,b,c,d # 0. Then, conjugating both sides of 


p=(a+bi)(c+di) gives p=(a-—bi)(c—di), 
since p is self-conjugate. Multiplying these two equations gives 
p= +b Y(c? +d”), 
so we necessarily have p = a + b? = c? + d* by unique prime factorization 
in the natural numbers. Thus p is a sum of two squares, and we have: 


Real Gaussian primes. An ordinary prime p is a Gaussian prime if and only 
if p is not a sum of two squares. Oo 


From here it is not far to reach a classification of all Gaussian primes: 
Gaussian primes. The Gaussian primes are of two types: 


e Real primes p that are not the sum of two squares, and the multiples of such 
primes by units. 

« The factors a — bi and a+ bi of real primes p = a* + b? with a,b # 0, and 
the multiples of such factors by units. 


Proof. If a + bi is a Gaussian prime with a,b # 0, then a — bi is also a Gaus- 
sian prime, since any factors of a — bi give factors of a + bi by conjugation. 
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Also, gcd(a,b) = 1, otherwise gcd(a,b) is a real divisor of a+ bi, again 
contrary to a + bi being a Gaussian prime. 

Then it follows from the factorization of the sum of two squares in the 
previous section that the norm a? + b? of a + bi is an ordinary prime. If 
not, a2 + b? splits into smaller norms ae + b?, in which case a + bi is not a 
Gaussian prime — because it splits into factors ay + bi of smaller norm. 

Thus any Gaussian prime is either 


an ordinary prime, and hence not a sum of two squares, or its multiple by a 
unit, or else 

one of the Gaussian factors a — bi,a + bi of an ordinary prime of the form 
a’ + b? with a,b # 0. a 


These results lead us to ask: Which ordinary primes are sums of two 
squares? And we are led to believe that the answer has something to do with 
the Gaussian primes. 


Exercises 


Thanks to unique prime factorization in ZV —2], established in the exercises 
to Section 2.5, we can classify the primes of Z| /—2] in the same way we 
classify Gaussian primes. 


1. If an ordinary prime p has a factorization into integers of smaller norm in 
Z| /—2], 
p= (a+bV—2)(c+dV—2), 


show that p = a* + 2b* = c* + 2d”. 

2. Deduce that an ordinary prime p is a prime of Z[/—2] if and only if p is 
not of the form a? + 2b7. 

3. Hence, show that the primes of Z{./—2], other than + ordinary primes, 
are the factors a + b ,/—2 of ordinary primes of the form a” + 2b”. 


2.8 Primes that Are Sums of Two Squares 


Apart from the exceptional prime 2, all ordinary primes are odd. Of these, the 
sums of two squares are of the form 4n + 1. This is because any odd sum of 
two squares must be the sum of an even square and an odd square; that is, of 
the form 


(2a)* + (2b + 1)? = 4a? + 4b? + 4b 4-1 = 4(a2 +b? +b) 41, 
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and this number is of the form 4n + 1. A famous theorem of Fermat says that 
the converse is true: any prime of the form 4n + | is the sum of two squares. 

There is a simple proof of Fermat’s theorem using Gaussian integers, due 
to Dedekind, though it depends on the following slightly technical lemma. 


Lagrange’s lemma. Jf p = 4n + | is prime, then p divides a number of the 
form m? +1. 


Proof. By Fermat’s little theorem (Section 1.7) we know x?~! = 1 (mod p) 
for any x # 0 (mod p). This means that 


O=xP 1 -—fax”—-1=0"-1)(0e""41) (mod 4n41) 


We also know, by Lagrange’s theorem on polynomials over a field (Section 
1.8) that a polynomial of degree k has at most k roots. Thus the 4n nonzero 
elements a in F4,+41 can be separated into two groups of 2n: the 2” roots of 
x?” — | =O and the 2n roots of x7" + 1 =0. 

If a is any root of the latter equation, we have, since 4n + 1 = p, 


0=a"+1=@7 41 God p). 


In other words, if we set m = a”, then p divides m2 +1. oO 


Fermat’s two square theorem. /f p = 4n +1 is prime, then p = a* + b* for 
some a,b € Z. 


Proof. By Lagrange’s lemma, there is an m € Z such that 
p divides m” + 1 = (m —i)(m +i). 
is a Gaussian integer. So, by the prime divisor property for Gaussian integers 
(Section 2.5), we conclude that p is not a Gaussian prime. 
But then p is a sum of two squares, as we saw in the previous section. O 


However, p divides neither m — i nor m +7 because neither = 


Exercises 


It is interesting to see what happens when the above arguments are adapted to 
the case of an ordinary prime p = 3n + 1. We begin with a counterpart of 
Lagrange’s lemma, obtained by imitating the proof above. 


1. If p = 3n +1, prove that p divides a number of the form m? + m + 1. 
2. Show that m? + m +1 = (m —¢)(m — ©), where ¢ = +3. Also 
show that ¢ € Z[¢]. 
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3. Conclude that p is not a Z[¢] prime, and hence is of the form a? — ab + b? 
(or, equivalently, a* + ab + b?) if Z[¢] has unique prime factorization. 


But does Z[¢] have unique prime factorization? We will see in the exercises 
to the next section. 


2.9 The Equation y> = x” + 2 


Euler (1770) gave a brilliant argument to show that the only positive integer 
solution of y? = x? + 2 is x = 5, y = 3. (Recall that this was the solution 
mentioned by Diophantus and claimed to be the only positive integer solution 
by Fermat.) Euler’s idea was to use numbers of the form a + b ./—2, for a, 
b € Z, which he guessed would behave like ordinary integers. This idea is 
correct, though not justified by him, and in this section we will see how and 
why it works. 

Numbers of the form a+b./—2 are useful in studying the equation 
y? = x* + 2 because they allow it to be written 


y= (x- V=2)(x+ V=2). 


Assuming now that x and y are ordinary integers, Euler guessed that the gcd of 
the factors x — /—2 and x + ./—2 on the right is 1. Assuming also that these 
factors behave like ordinary integers, he concluded that they must be cubes, 
since their product is the cube y?. 

If so, let 


x 5) = (a b J-2) 
= a> — 3a*b V—2 + 3ab?(—2) — b3(—2) V—2 
= a? — 6ab’ — (3a*b — 2b*) /—2 


Equating coefficients of ./—2, we get 
1 = 3a*b — 2b? = b(3a? — 2b"). 


Thus, b is an ordinary integer that divides 1, so b = +1 and hence b? = 1. 
This makes 3a” — 2b? (the other divisor of 1) equal to 3a” — 2, which can only 
divide 1 if a = +1. Since we want a positive value of x = a* — 6ab’, we find 
by trial that a = —1, in which case x = 5. 

Then we also have y? = x* + 2 = 27, so y =3. 
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2.9.1 Justification 


To clarify the ideas in Euler’s proof, we use the name Z|./—2] for the ring of 
numbers a + b./—2 with a,b € Z, as we have already done for some time in 
exercises. 

In Z[/—2] we call a — b./—2 the conjugate of a+b »/—2 because it has 
the properties of conjugate listed in Section 2.4. This conjugate is in fact the 
usual complex number conjugate. It follows that the norm in Z[/—2] is 


la + bJ—2)? = (a+b V—2)(a — b J—2) = a? + 26°. 


This implies that the only numbers of norm | in Z| /—2] are +1, and hence 
that -+1 are the only units of Z[/—2]. 

It follows from the multiplicative property of conjugation that the norm 
is multiplicative, and hence that any divisors of a + b./—2 have norms that 
divide a? + 2b’. 

Now we can deal with the first important claim in Euler’s proof: that the gcd 
of x+ /—2 and x — /—2is 1. This is actually not true for all ordinary integers 
x: for example, when x = 2, the gcd is ./—2. But it is true if x is odd, so first 
we have to check that x is odd in any solution of y? = x* + 2. Well, if x is 
even, then x? + 2 = y? is even, so y is even. Suppose y = 2m, in which case 
y? = 8m? is divisible by 8. But x7 + 2 = (2n)? +2 = 4n? +2 is not divisible 
by 4, let alone 8, so we have a contradiction. 

Having established that x is odd, we see that the norm x7 +2 o0fx+ /—2is 
odd, so x + /—2 is not divisible by any number with even norm. But any com- 
mon divisor of x — /—2 and x + ./—2 divides their sum, 2, whose norm 4 is 
not divisible by any odd natural number except |. Thus, | is indeed the gcd of 
x — /—2 and x + /—2 when y? = x? +2. 

The second important claim is that x — ./—2 and x + ./—2 are both cubes. 
Given that x — /—2 and x + ./—2 have no prime factor in common, this 
follows by distributing the prime factors of y*, each of which occurs three 
times, between x — \/—2 and x + /—2. Because, assuming unique prime 
factorization, each prime factor in x — ./—2 occurs three times, and so does 
each prime factor in x + ./—2. This implies that x — /—2 and x + /—2 are 
both cubes. (We leave it to the reader to check that the units +1 are harmless.) 

Thus, the second claim depends on proving unique prime factorization for 
Z[/—2]. Fortunately, this is almost the same as the proof for Z[i] in Section 
2.5. Recall that it was sufficient to prove the division property, which, for 


Z| /—2 , reads: 
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BU + /—2) 


\p\ 


Figure 2.3 A typical rectangle in the lattice of multiples 1B. 


Division property for Z[./—2]. For any a,B € Z[./—2] with B # 0 there 
are [L,0 € VAR, —2| such that 


a=Bu+p and |p| </6l. 


Proof. Recall also the argument for the division property: the set of multiples 
iB form a lattice of the same shape as (in this case) Z| /—2], but magnified by 
|6|. This lattice consists of rectangles whose short side is |8| and whose long 
side is |B| V2. Figure 2.3 shows one of them. 

The argument concludes by saying that any point @ in such a rectangle lies 
at distance |o| < || from the nearest corner. This is not quite as clear as it was 
for the square lattice. But it can be seen by considering the point most distant 
from the corners — the center point — which is at distance 


i (3) +(4) =161(5 +3) <i6t 


by the Pythagorean theorem. Oo 


Exercises 
The above argument does not work for Z [Vv —3}. 


1. Show that the division property fails in Z [Vv —3) by finding the distance 
from the center point of a rectangle with sides 1 and /3 to the nearest 
corner. 
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2. Show directly that unique prime factorization fails in Z [Vv —3] by finding 
two distinct prime factorizations of 4. 


However, a geometric argument for the division property works in Z[¢], 
where 


Se = 
=_—— 


and hence Z[¢] has unique prime factorization. 


g 


3. Show that the points 0, 1, and 1 + ¢ lie at the corners of an equilateral 
triangle in the plane of complex numbers, and hence that Z[¢] has the 
shape of a tiling of the plane by equilateral triangles. 

4. Use an argument about shape to prove that the division property, and hence 
unique prime factorization, holds in Z[¢ ]. 

5. Show that the extra points of Z[¢] lie at the center points of the rectangles 


inZ [Vv —3). 
We will see later (in Section 4.6) that it is appropriate to view the number ¢ 
as an “algebraic integer.” 


2.10 Discussion 


It is hard to exaggerate the influence of Diophantus on the later development 
of arithmetic and algebra. His work, the Arithmetica, contains only a series of 
examples of special problems and special solutions, yet the problems seem to 
have been chosen to illustrate general principles. His most important followers, 
Fermat and Euler, certainly understand them in this way. Fermat drew general 
conclusions from the special cases given by Diophantus and claimed to have 
proved them, though in most cases he did not reveal his proofs. Euler gave 
the first published proofs of several of Fermat’s generalizations of Diophantus, 
and those he could not prove were a spur to later mathematicians. Indeed, 
even the results Euler proved were revisited by his successors — such as 
Lagrange, Gauss, Dirichlet, and Dedekind — in order to test new and more 
general methods. 

The works of Diophantus may be seen in English translation in Heath 
(1964), and his influence on Fermat and Euler is traced in Weil (1984). 


2.10.1 The Chord and Tangent Methods of Diophantus 


Most problems in Diophantus are about finding rational solutions to equations, 
and it is the handful of problems about integer solutions that are most relevant 
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to the subject of this book. Nevertheless, some of his methods for finding 
rational solutions are remarkable, and worth at least a brief discussion. 

The first, called the chord method, was used in Section 2.1 to find the 
rational points on the circle, and hence to find a formula for Pythagorean 
triples. The outcome in this case was indeed a result about integer solutions, 
namely the solutions of a? + b? = c?. However, this happened only because 
a+b? = c? is homogeneous in the variables a,b,c, so an integer solution 
(a,b,c) corresponds to a rational solution x = a/c, y = b/c of x? + y? = 1. 

The chord method works in general to find rational points on any quadratic 
curve p(x,y) = 0, where p(x,y) is a quadratic polynomial with rational 
coefficients, provided the curve has a rational point P. We can then find 
any other rational point by drawing a line (a “chord’’) of rational slope t 
through P and finding the other point Q where it meets the curve. Substituting 
the equation of the chord in p(x, y) = 0 gives a quadratic equation g(x) = 0, 
the two solutions of which represent P and Q. These solutions also correspond 
to factors of q(x); one of them is necessarily rational, since it represents P, so 
the other is also rational, which implies Q is rational. Thus, we find, as in 
the case of the circle, that the rational points correspond one-to-one with the 
rational values of f. 

The algebraic key to this argument is that g(x) = 0 is a quadratic equation 
with rational coefficients, and hence if one factor of g(x) is rational, so is the 
other. But this works only for a polynomial g(x) of degree 2. If we have a 
cubic curve p(x,y) = 0 and we know two rational points P, Q on the curve, 
then a similar argument shows that a line through P and Q meets the curve in 
a third rational point R. (Because if r(x) is a cubic polynomial with rational 
coefficients and two rational factors, then the third factor is also rational.) 

But what if we know only one rational point P on a cubic curve 
p(x,y) =0? This is where geometry meets algebra in an interesting way, 
and it leads to the tangent method. Substituting the equation of a line of 
rational slope t through P in the equation p(x, y) = 0 gives a cubic equation 
r(x) = O with rational coefficients. So far, so good, but we know only one 
factor of r(x): the rational factor corresponding to P. How can we turn one 
factor into two? The answer is by taking the line through P to be the tangent. 

The tangent at P is the limit of a line through P that meets the curve at 
a nearby point P’. While P and P’ are separate, they correspond to separate, 
but nearly equal, factors of the the cubic polynomial r(x). When P’ merges 
with P, bingo! The two factors become equal, so r(x) now has two rational 
factors, and hence its third factor (corresponding to the other point Q where 
the tangent at P meets the curve) is also rational. This gives a second rational 


point, Q. 
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An example by Diophantus himself is in the Arithmetica, Book VI, Problem 
18: the curve y* = x? — 3x? + 3x + 1. This equation has the obvious rational 
solution x = 0, y = 1. Diophantus makes the substitution y = 3x + 1. Proba- 
bly he did so because he could foresee the cancellation of 3x + | that leads to 
the equation 


and hence to another rational point with x = 21/4. But y = 3x + 1 is none 
other than the tangent at the point (0,1), so the geometric argument above 
explains why it gives another another rational point (it also explains the two 
factors x). Whether or not Diophantus thought this way, the method is now 
called the “Diophantus tangent method.” 


2.10.2 Absolute Value 
The Diophantus identity of Section 2.3, 


(a* + b*)(c? +d’) = (ac — bd)* + (ad + be)’, 


was of course interesting to him mainly as a fact about sums of squares. 
Its interpretation as a fact about complex numbers and their absolute values 
lay far in the future.! Nevertheless, Diophantus also had a two-dimensional 
interpretation in mind, since he associated any sum of squares a + b? with 
the right-angled triangle with base a and height b (as would anyone whose 
mathematical education emphasized the Pythagorean theorem!). 

The triangle interpretation adds a little content to the Diophantus identity, 
by associating with two given triangles a “product triangle” whose hypotenuse 
is the product of the hypotenuses of the given triangles. Figure 2.4 shows this 
interpretation visually. But this picture contains more than a product of lengths: 
there is also a sum of angles. The “product triangle” has base ac—bd and height 
ad + bc, so the slope of its hypotenuse is 

ad + be by d tan 6 + tang (*) 


ac—bd 1-24 1 —tan@ tang’ 


where 6 and ¢ are the angles in the given triangles. If one recalls the formula 


from trigonometry, 
tan@ + tang 


———— = tan(6 ; 
1 —tané tang Se 


! The interpretation of a + ib as the point (a, b) in the plane is credited to Wessel (1797), and it 
was rediscovered shortly thereafter by Argand and Gauss. 
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Figure 2.5 Angles in the product of triangles. 


then (*) says that the angle of the “product triangle” is the sum of the angles 
in the given triangles. Figure 2.5 shows this additional information, which was 
apparently first observed by Viéte (1615). 

We can now see that the Diophantus identity has rich two-dimensional con- 
tent. So, in a sense, the mathematical community was ready for the geometric 
interpretation of complex numbers by the time of Viéte. The opportunity was 
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missed, but we can only speculate why — perhaps the time was not yet ripe for 
an algebraic approach to geometry, which began only with Descartes (1637). 


2.10.3 Norms 


We have seen in this chapter that, for most applications to number theory, the 
squared absolute value, the norm, is more useful than the absolute value itself. 
This is because the norm is a natural number, so it reduces certain questions 
about algebraic integers to questions about natural numbers. In particular, we 
can exploit the fact that any set of natural numbers has a smallest member, and 
we can use prime factorization of natural numbers to discover factorizations of 
algebraic integers. 

The parallel between natural number factorization and algebraic integer 
factorization occurs because of the multiplicative property of the norm: 


norm(jzv) = norm(jz)norm(v). 


In the case of Gaussian integers 4 = a + ib, v = c + id, this is just a restate- 
ment of the Diophantus identity. We saw in Section 2.4 how to use the 
multiplicative property to find factorizations of Gaussian integers and, in 
particular, to recognize Gaussian primes. It is a similar story in Z| /—2], 
where the norm is again just the square of the absolute value, and hence its 
multiplicative property follows from the multiplicative property of absolute 
value for complex numbers. 

However, the multiplicative property of the norm extends to algebraic 
integers that are strictly real, such as the integers a + b 2 we investigated in 
connection with the special Pell equation x? —2y? = 1. The norm of a+b /2 
is defined to be the product of the conjugates a + b /2 and a — b /2, namely 


norm(a + b V2) = (a+ bV2)(a—bV2) =a’ — 2b’. 


Its multiplicative property then follows from the multiplicative property of 
conjugation, proved in Section 1.5. 

The same is true of the algebraic integers a + b/N for any nonsquare 
integer NV. We define 


norm(a + bVN) = (a+ bVN)(a—bVN) =a’ — ND’, 
and we can prove 


norm ((a + bVN)(c+ dV/N)) = norm(a + bVN)norm(c + dN). 


This follows, again, by a multiplicative property of the conjugation operation, 
which in this case sendsa +b /N toa—bVN. 


https://doi.org/10.1017/97810090041 38.004 Published online by Cambridge University Press 


58 2 Diophantine Arithmetic 


If we calculate the norms in the latter equation, we find, since 
(a+bV/N)(c+d VN) = ac + bdN + (ad + be) VN, 
an identity analogous to (and a generalization of) the Diophantus identity: 
(ac + Nbd)” — N(ad + bc)* = (a? — Nb’)(c* — Nd’). (**) 


The latter identity involves only real numbers, so it can be verified by 
expanding both sides and comparing the results. In fact it was discovered for 
positive nonsquare N by Brahmagupta around 600 cE, and he used it to find 
solutions of the general Pell equation 


Ny = Le 


In particular, it follows from (**) that from two solution pairs (a, b) and (c,d) 
of the Pell equation, we can calculate a third, namely (ac + Nbd,ad + bc). 
For more on Brahmagupta and other Indian investigators of the Pell equation, 
see Weil (1984). 

So far we have discussed only algebraic integers that satisfy quadratic 
equations, where the conjugates come in pairs a + b /N,a — b JN, and the 
norm is the product of the conjugates. This includes the cases of Gaussian 
integers and Z| /—2], where the product of conjugates equals the squared 
absolute value. Dedekind (1871) discovered that the norm concept can be gen- 
eralized to more complicated algebraic numbers by generalizing the concept of 
conjugate. We will see in Chapter 6 that an algebraic number of degree n has 
n conjugates (including itself), and in Chapter 7 that its norm is the product of 
these conjugates. Again, the multiplicative property of the norm follows from 
a multiplicative property of conjugates. 

However, there is another way to approach the multiplicative property. 
Readers familiar with the determinant concept from linear algebra will recall 
that the determinant function det also has a multiplicative property: 


det(AB) = det(A) det(B), 


where A and B are square matrices of the same size and AB denotes their 
matrix product. We will see in Section 6.4 that an algebraic integer w can 
be modeled by an integer matrix whose determinant (necessarily an ordinary 
integer) is the norm of a. The multiplicative property of the norm then follows 
from the multiplicative property of det. We take the opportunity to offer a 
refresher course in determinants, including their multiplicative property, in 
Chapter 7. 
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Quadratic Forms 


Preview 


As we saw in the previous chapters, interesting Diophantine problems begin 
with quadratic equations. At the heart of many problems is the question of 
prime numbers given by quadratic formulas. Diophantus himself looked at 
primes of the form x? + y”, and Fermat extended his investigation to primes 
of the form x* + 2y? and x? + 3y?. In each of these cases, primes of the given 
“quadratic form” can also be expressed in a “linear form.” An example is the 
odd primes of the form x? + y*, which we know are those of the form 4n + 1. 

But Fermat was stymied by the form x” + 5y*, whose primes seem to defy 
such a simple characterization. Lagrange probed this problem more deeply 
by introducing the notions of equivalent forms, and the discriminant of a 
form. He found that, when k = 1,2,3, all forms with the same discriminant 
as x? + ky are equivalent, but there are inequivalent forms with the same 
discriminant as x? +5y?. This led Gauss (1801) to a thorough study of 
quadratic forms and their “composition,” from which the concept of Abelian 
group emerged. 

The general group concept will not be used explicitly in this book, but some 
Abelian groups will be important, and also their generalization to the concept 
of module studied in Chapter 8. Quadratic forms also meet the concept of 
module via the discriminant, which originally arose from linear maps of two 
variables but was later generalized to linear maps with any finite number of 
variables. 

This chapter is somewhat less self-contained than the others in this book. 
It deals with concepts that are going to be superseded, so we sometimes give 
only sketchy explanations, with cross references to later parts of the book for 
the theory that replaces them. 


59 
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3.1 Primes of the Form x? + ky” 


Fermat’s theorem that odd primes of the form x? + y* are those of the form 
4n + 1 was the first of three theorems he discovered about primes of the form 
x24 ky?, for k = 1,2,3. The first is stated in Fermat (1640), and the other two 
in Fermat (1654). 


Fermat’s theorems on primes. /f p is any odd prime and x and y are inte- 
gers, then 


Ll p=x?+y? & p=1 (mod 4). 
2. p=x*+2y* & p=l1or3 (mod 8). 
3. p=x?+3y? & p=1 (mod 3). 


Fermat claimed to be able to prove these theorems, which was probably 
true, because Euler later found proofs of them using Fermat’s methods. 
However, both Fermat and Euler got stuck on the form x? + Sy? (which is 
the next one of interest, since x? + 4y? is of the form x* + y”). They left only 
the following conjectures, which show a puzzling distribution of primes in the 
congruence classes of 1, 3, 7, and 9 mod 20. 


Fermat’s conjecture. /f two primes are of the form 20n + 3 or 20n +7, then 
their product is of the form x* + Sy. 


Euler’s conjecture. /f p is any prime, then 
p=x’+5y? & p=1or9 (mod 20), 
2p =x"? +5y? & p=3o0r7 (mod 20). 
The anomaly of x? + 5y? was an irritant to mathematicians until the late 


19th century, and it was soothed only by the development of several new ideas. 
As (Weil, 1974, p. 104) said: 


When there is something that is really puzzling and cannot be understood, it 
usually deserves the closest attention because some time or other some big new 
theory will emerge from it. 


Exercises 


1. Make a list of the odd primes less than 50; mark those of the form 4 + 1, 
8n + 1, 82 + 3 and 3n + 1; and check that they can be written in the forms 
predicted by Fermat’s theorems. 

2. List the primes less than 100 that have the form x7 +5 y?, and use them to 
test the conjectures of Fermat and Euler. 
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Figure 3.1 Pierre de Fermat (1607-1665) and Leonhard Euler (1707-1783). 
Public domain and via Getty Images, respectively. 


3.2 Quadratic Integers and Quadratic Forms 


As we saw in Section 2.9, Euler had great success using the “quadratic 
integers” a + b \/—2 and, implicitly, exploiting their unique prime factoriza- 
tion. We also saw, in Section 2.8, that the Gaussian integers Z[i] are a very 
convenient tool for proving the first of Fermat’s theorems. Their effectiveness 
is due to the factorization x” + y? = (x —i)(x +71) and to unique prime 
factorization in Z[i]. 

We might similarly attack the second Fermat theorem using the factoriza- 
tion x? + 2y? = (x — y /—2) (x + y /—2) and unique prime factorization 
in Z[/—2], which was established in Section 2.9. And a similar attack can be 
made on x* + 3y? = (x —y /-3) (x t+y /-3), admittedly working with a 
slightly larger ring of “integers” than Z|/—3] in order to secure unique prime 
factorization. 

But this approach will definitely fail with x7 + 5y* because the appropriate 
ring of “integers” 


Z[V—5] = {a+ bV—-5: a,b €Z} 


does not have unique prime factorization. We can see this from the two 
factorizations of 6: 


6=3-2=(1— V—5)(1+ V—5). 


In Z|/—5], where the norm of a + b./—5 is a2 + 5b?, none of the numbers 
2,3,1— /—S, or 1+ /—S splits into factors of smaller norm. This is because 


https://doi.org/10.1017/97810090041 38.005 Published online by Cambridge University Press 


62 3 Quadratic Forms 


Figure 3.2 Joseph-Louis Lagrange (1736-1813) and Carl Friedrich Gauss (1777- 
1855). Licensed under Creative Commons Attribution-ShareAlike 4.0 Interna- 
tional License. 


none of their respective norms — 27, 37, 6,6- splits into smaller norms in Z, 
since neither 2 nor 3 is a norm. Thus 2,3, 1 — /—5,1+ /—5 are “primes” of 
Z|/—5] in this sense, and yet the two “prime” factorizations of 6 are clearly 
different. 

Nevertheless, this failure is revealing, as a possible explanation of the 
anomalous behavior of the form x* + 5y discovered by Fermat. It is revealing 
to us, I should say, because in fact unique prime factorization was not identified 
as a key concept of arithmetic until Gauss (1801) explicitly stated it and gave 
a new proof in his Disquisitiones Arithmeticae. Moreover, by this time Gauss 
had also foreseen the possible failure of unique prime factorization, so he did 
not pursue the study of rings of “quadratic integers” such as Z| /—5]. 

Instead, Gauss made a deep study of quadratic forms: a field that had been 
opened up by Lagrange (1773). 


Exercises 


The number 6 = 17 +5 - 1? is of the form a* + 5b”, with a = 1, b = 1. Other 
numbers of the form a2 + 5 have similar behavior when factorized. 


1. Show that a? + 5, fora = 1,2,3,4, splits into ordinary primes not of the 
form x? + 5y?, and that each of these numbers a? + 5 has nonunique 
“prime” factorization in VAN, —5]. 


Since unique prime factorization fails in Z[V —5] when “primes” are 
defined in terms of the norm, it must be that the division property fails when 
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Figure 3.3 Geometric view of Z| V¥—5). 


the size of remainders is measured by the norm (or, equivalently, by absolute 
value), since the division property implies unique prime factorization. 

In fact we can also see that the division property fails by a direct geometric 
argument about the shape of rectangles in Z[/—5]. Figure 3.3 shows some of 
these numbers as points in the plane C. They clearly form rectangles of width 
1 and height /5. 


2. Explain why the multiples of any nonzero B € Z[/—5] form rectangles of 
width || and height |B| /5. 

3. Show that a member of Z[/—5] inside one of these rectangles can lie at 
distance > |6| from the nearest corner. 

4. Deduce that the division property fails for Z[/—5]. 


3.3 Quadratic Forms and Equivalence 


The reason for seeking primes of various forms, such as x* + y?, goes back 
to Diophantus: it is the key to finding all numbers of the given form. And 
if one wants to know all numbers of the form ax* + bxy + cy, it may be 
useful to study other forms that represent the same set of numbers. This led 
Lagrange (1773) to consider mapping the variables x, y to variables x’, y’ given 
by equations of the form 


x’ = px +qy 


y’=rx-+sy, where p,q,r,s € Zand ps—qr=+Hl, 
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since ps — gr = det (° : = +1 is the condition for the map to be a 
rs 


bijection of the set Z x Z. Such maps are called unimodular. 

Under this map, the quadratic form ax* + bxy + cy* becomes the 
equivalent form a’ x’? + b'x’y’ + cy’. This relation is indeed an equivalence 
relation because the unimodular maps include the identity (giving reflexivity), 
inverses (giving symmetry), and the product of any two of them (giving 
transitivity).! 

Lagrange discovered that any form equivalent to ax” + bxy + cy? has the 
same discriminant b* — 4ac. This observation is the first step towards deciding 
whether forms are equivalent, but it is not sufficient: There are inequivalent 
forms with the same discriminant. In particular, x? + Sy” and 2x? + 2xy +3y 
both have the discriminant —20, but 7 is of the second form and not of the first. 

On the bright side, the forms x* + y?, x7 + 2y?, and x? + 3y? are each 
equivalent to all forms with the same discriminant. Lagrange proved this by 
a process giving each form with negative discriminant D a reduced form 
ax? + bxy + cy* with |b| < a < c. From this it follows that 


—D = 4ac — b? > 4a? — a? = 3a? 


and hence there are only finitely many reduced forms with given negative 
discriminant D. Certainly, —D > 3a? limits the values of a, hence also of b, 
and then also — D = 4ac—b? limits the values of c. So, one can find all reduced 
forms with a given negative discriminant D. This leads to the following results: 


1. All forms with discriminant —4 are equivalent to x? + y?. 
2. All forms with discriminant —8 are equivalent to x* + 2y*. 
3. All forms with discriminant —12 are equivalent to x? + 3y?. 


But, as we have seen, not all forms with D = —20 are equivalent. There 
are two inequivalent forms with discriminant —20, namely x* + Sy* and 
2x? + Ixy + 3y?. 

Thus the equivalence theory of quadratic forms reveals a difference between 
the first three discriminants and the fourth: for the first three, the number of 
equivalence classes is 1, while for the fourth it is 2. The number of equivalence 
classes of forms with discriminant D is called the class number, /(D). 


! Or, to put it in algebraic language, because the unimodular maps form a group. If the reader is 
not already familiar with the three defining properties of an equivalence relation, they may be 
found in Section 4.1. 
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It turns out that class number | implies unique prime factorization in the 
appropriate ring of quadratic integers.* But we can see immediately how the 
class number of x” + y” gives another proof of the Fermat two square theorem. 
This is basically Lagrange’s proof, with a simplification of the crucial quadratic 
form due to Gauss (1801), article 182. 


Two square theorem revisited. If p = 4n + 1 is prime, then p = x* + y? for 
some x,y € Z. 


Proof. Thanks to Lagrange’s lemma, used in the first proof of this theorem in 
Section 2.8, there is an integer m such that p divides m? + 1. Then (m? + 1)/p 


is an integer, so px? + 2mxy + moh y? is a quadratic form, with discriminant 


2 
1 
P agen ap Se Ge eS 4, 
P 
Also, px* + 2mxy + ml y2 takes the value p when x = | and y = 0. 


Since this form has discriminant —4, it is equivalent to x? + y?, so x? + y? 
also takes the value p. Oo 


Exercises 


Lagrange’s result on forms of negative discriminant can also be used to prove 
the result in the exercises for Section 2.7 about primes p of the form 3n + 1. 


1. Use Lagrange’s result to show that the only reduced forms with D = —3 
are x7 + xy + y?. 

2. Deduce from exercise 1 of Section 2.8 that if p = 3n + 1, there is an 
m € Zsuch that px* + (2m + 1)xy + mtn y2 is a quadratic form, with 
D=-3. 


3. Give another proof that the prime p has the form x* + xy + y’. 


It is sometimes convenient to write each quadratic form as ax*-+2bxy+cy?, 
and to take its discriminant to be ac — b?, because then 


5 Ww a b x 
(ax” + 2bxy + cy") = (x y) ¢ ’) (*) 


2 We will not explain this connection in detail, since we are going to drop the theory of quadratic 
forms in favor of the theory of quadratic integers. But, just briefly, class number | is equivalent 
to the statement that all ideals are principal, and we will show in Section 5.2, that the latter 
statement implies unique prime factorization. 
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and the discriminant is 


ac— 0? = det(§ ’). 


Now suppose we have a change of variables x’ = px + qy, y’ =rx-+sy, with 


matrix 
Px C ‘) 
r s 


4. Show that the form a’x’? + 2b'x’y’ + c’y’? in the new variables has 
discriminant det(T)?(ac — b”), which equals ac — b? if the map is 
unimodular.? 


3.4 Composition of Forms 


Lagrange’s theory of equivalent quadratic forms, and his discovery of the class 
number, revealed the hidden companion 2x? + 2xy + 3y* of x? + 5y? and 
showed that these two forms represent the two equivalence classes of forms 
with discriminant —20. Lagrange also discovered that these two forms interact 
with each other by an operation later called composition of forms. 

The first known example of the composition operation was the Diophantus 
identity of Section 2.3, which we now write as 


(xt + yD (ad + 3) = (xixe — ya)? + (arye + yao)? = X7 + Y7, (4 


which shows that any two expressions of the form x* + y” have a “composite” 
of the same form. Another example where two expressions of a certain form 
yield another of the same form was discovered by the Indian mathematician 
Brahmagupta around 600 CE: 

(xj —nyp) (xg —my3) = (xixa-tny1 y2)? —m(aiy2-+y192)” = X?—n?, (**) 


Like (*), (**) can be proved by factorizing the left side and rearranging the 
factors, in this case as 


[@1 — Vayi)@2 — Vny2)][@1+ Vnyi)G2+ Vny2)]. 


In each of these examples a form “composed with itself yields itself.” 


3 This exercise assumes some knowledge of determinants and their multiplicative property 
det(AB) = det(A) det(B), at least for 2 x 2 matrices. However, in Chapter 7 we will develop 
their general theory from scratch, in order to put this knowledge on a sound foundation. 
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The identity (**) with n = —5 shows that the form x* + Sy* “composed 
with itself yields itself,’ but Lagrange found that the forms x? + Sy? and 2x? + 
2xy +3y* have a more interesting interaction. First, the form 2x? + 2xy + 3y? 
“composed with itself” does not yield itself, but rather x* + 5y*, because 


(2x7 + 2xiy1 + 3yf) (2x2 + 2x2y2 + 33) = X? + SY”, 


where X = 2x1x2 + x1 y2 + x2y1 — 2y1y2 and Y = x, y2 + x2y1 + YLy2. 
And second, the form x* + 5y* composed with 2x* + 2xy + 3y? yields 
2x? + 2xy + 3y? again, because 


(x? + Sy?) (2x3 + 2xzyo +393) = 2X? + 2XY + 3¥?, 


where X = x1x2 — yjx2 — 3y, y2 and Y = x1 y2 + 2yjx2 + yi y2 
These stunning feats of high school algebra can be reproduced almost 
mechanically by using the following factorizations in Ql Vv =5\, 


x? +5y? = (x-y —5 (x + yV—5) 


2x? + 2xy +3y? = 2[x+ 2(1- V=3)|[x+ 5 (1+ v=3)|, 


and rearranging so as to get conjugate factors, as above for (**). 
More concisely, if the class of x* + Sy? is A and the class of 2x? + 2xy + 
3y? is B, then we see that A and B have the following “multiplication table”: 


AoA, AB = BASE, BP =A. 


Today, we would recognize this table as defining the two-element group with 
identity element A. It is now called the class group of Qi V5] and is defined 
in an entirely different way. We can already see the advantage of working 
with the numbers a + b /—5, though admittedly the failure of unique prime 
factorization in Z| /—5] might be a problem. 

It was perhaps this obstacle that Gauss (1801) was trying to avoid when he 
developed a comprehensive theory of composition of quadratic forms in his 
Disquisitiones Arithmeticae. He managed to define a composition operation 
for the inequivalent forms of any discriminant, but at the cost of enormous 
complications which virtually buried the underlying algebraic structure. In 
particular the associativity of the composition operation— A(BC) = (AB)C - 
requires the derivation of 37 equations, most of which are left to the reader! 

The easier theory of quadratic integers did not emerge until the 1870s, 
only after Dirichlet and Dedekind had used them to simplify the theory of 
quadratic forms as far as it would go. Eventually, quadratic forms receded 
when Dedekind (1871) found a theory of algebraic integers of any degree, 
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overcoming the problem of prime factorization with his theory of ideals, and 
defining the class group in terms of ideals. We introduce ideals in Section 5.2. 


Exercises 


The role of the form 2x” + 2xy + 3y7 can be observed in the numbers a? + 5 
studied in exercise | of Section 3.2. 


1. Check that the ordinary prime factors of the numbers a* + 5 for 
a = 1,2,3,4 are all of the form 2x* + 2xy + 3y’. 


Now let us see how Lagrange’s identities arise from factorizations in 
Z|./—5 : 


2. Using the factorizations of x* + 5y* and 2x* + 2xy + 3y? given above, 
show that 


(a + Sy?) (2x3 + 2x2y2 + 3y3) 
= 2(x1 + 1 ¥=3) [v2 + F(1+ V=3)| 
x (11 — V3) [x2 + (I - V=3)]. 


3. By multiplying the first pair of factors and considering conjugates, show 
that the right-hand side equals 


fete] tev] 


where X = x1x2 — yix2 — 3y, y2 and Y = x1 y2 + 2yjx2 + 1 y2. 
4. Finally, show that 


2| x4 s(i+ V=3)| [x4 F(t = V=3)| = 2x? boxy $397 


3.5 Finite Abelian Groups 


Before Dedekind’s theory of algebraic integers was published, Kronecker 
(1870) published a very insightful paper proving what we now see as the 
fundamental theorem on finite Abelian groups. Motivated by the example 
of Gauss’s composition of forms, Kronecker first defined such a group: a finite 
set G of objects g with an identity element | and an inverse element g~! for 
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A B 


Figure 3.4 A nonsquare rectangle. 


each g € G with an operation (written as product) satisfying what we now call 
the Abelian group axioms: 


81(8283) = (8182)g3 (associativity) 
8182 = 8281 (commutativity) 
gl=g (identity) 

8& ae (inverse) 


His theorem showed that a finite Abelian group is a product of the simplest 
Abelian groups, the cyclic groups. A group C is finite and cyclic (and 


"1 of an 


necessarily commutative) if it consists of the powers lI,c, C2,...,€ 
element c with c” = 1. For example c = —1 gives a two-element cyclic group 
because c* = 1. The direct product C x D of groups C and D consists of the 
ordered pairs (c,d) forc € C andd € D multiplied “coordinatewise”’; that is, 
according to the rule (cy,d1) - (c2,d2) = (c1C2, dd). This is similarly true for 
the direct product of any number of groups. If a coordinate, say c, belongs to a 
cyclic group of n elements, then one also has the rule c” = 1. 


Kronecker’s theorem, in slightly modernized language, is the following. 


Structure of finite Abelian groups. Any finite Abelian group is either cyclic 
or the direct product of cyclic groups. 


Kronecker’s proof is elementary but rather long, so I will be content to 
illustrate the theorem (which we do not need for this book) with a simple 
example: the four-group. 

As its name suggests, the four-group has four elements; but it is not the 
cyclic group with four elements. The most graphic way to describe G is as 
the group of symmetries of a nonsquare rectangle ABC D (Figure 3.4). The 
symmetries of ABC D are the moves that take the rectangle to a position that 
looks exactly the same, and they all result from flips over its horizontal and 
vertical axes of symmetry (dotted and dashed lines in the figure). 
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A B D Cc B A Cc D 


Figure 3.5 The four positions. 


Informally, these are the moves by which a mattress may be turned and still 
fit on the bed. Experience shows there are four such moves, and they produce 
the positions of ABC D shown in Figure 3.5. 


No move. 

Flip over the horizontal axis. 

Flip over the vertical axis. 

Two flips, one over each axis. (This move does not depend on the order of 
the flips and is a 180° rotation about the center point of the rectangle.) 


pr Nr 


Since the two flips commute, this group is Abelian. It is not cyclic because 
there is no single move which, when repeated, give all four positions. In fact, 
each move, done twice, returns the rectangle to its original position. 

Thus, the four-group is an Abelian group of four elements that is not cyclic. 
It is the direct product of two-element cyclic groups {1, — 1}, namely the 
“horizontal flip group” and the “vertical flip group.” This is because each move 
in the four-group may be represented by an ordered pair (a,b), where a is | for 
no flip and —1 for a flip over the horizontal axis, and b is 1 for no flip and —1 
for a flip over the vertical axis. 


Exercises 


Cyclic groups of one and two elements, namely {1} and {1, — 1}, occur in the 
real numbers R, with ordinary multiplication as the group operation. 


1. Show that a cyclic group of n elements occurs in the complex numbers C, 
with ordinary multiplication as the group operation. 


Cyclic groups occur elsewhere in number theory; for example, in the finite 
fields F, introduced in Section 1.7. It is a theorem of Gauss (1801), article 55, 
that the nonzero elements of F, form a cyclic group under multiplication. An 
element a € F, whose powers are all the nonzero elements of F, is called a 
primitive root. 


https://doi.org/10.1017/97810090041 38.005 Published online by Cambridge University Press 


3.6 The Chinese Remainder Theorem 71 


2. Show that 2 is a primitive root for Fs. 
3. Find primitive roots for F7 and F)1. 


3.6 The Chinese Remainder Theorem 


A natural example of a finite Abelian group that splits into a direct product 
occurs in the so-called Chinese remainder theorem, which is about multiplica- 
tion mod st. Interestingly, this example is part of the splitting of a finite ring 
into a direct product. In Section 1.7 we discussed addition and multiplication 
mod p, where p is a prime number. Here we extend the idea to addition and 
multiplication mod k, and then specialize to the case where k = st where s 
and ft are relatively prime; that is gcd(s,t) = 1. 
In the general case we have congruence classes mod k, 


[a] = {a+nk:n € Z}, 


with the sum and the product of congruence classes given by [a] + [b] = [a+ )] 
and [a] -[b] = [ab]. These operations are well-defined by the calculation 
in Section 1.7, where we also saw that the congruence classes inherit the 
ring properties from Z. The ring of congruence classes mod k is called 
Z/kZ, because it can be viewed as a quotient of the ring Z by the ideal* 
kZ = {nk :n € Z}. 

It is no longer always true that each nonzero class [a] has an inverse. In fact, 
[a] has an inverse if and only if gcd(a,k) = 1, because 


gcd(a,k) = 1 <& ma+nk = 1 for some m,n € Z 


<> [a] has the inverse [m], mod k. 


It follows that the invertible elements [a] mod k form an Abelian group, called 
(Z/kZ)*, since the set of invertible elements clearly includes the identity class 
[1], also the inverse of any member, and the product of any two members. 


Chinese remainder theorem. When gcd(s,t) = 1, the ring Z/stZ is isomor- 
phic to (Z/sZ) x (Z/tZ) 


Proof. For any integer a we will write the congruence classes of a mod st, 
mod s, and mod f, respectively, as 


4 Quotients of rings by ideals will be introduced in Section 5.3. We mention them here only to 
explain where the notation Z/kZ comes from. 
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[a]s5 = {a +nst:n € Z}, 
[als = {a +ns:n € Z}, 
[a], = {a+nt: ne Z}. 


Consider the st different remainders on division by st, 
0,1,2,3,...,st—1 


and compare it with the first st remainders on division by s, which consists of 
t repetitions of the sequence 


0,1,2,3,...,5—1, 


and also with the first st remainders on division by ft, which consists of s 
repetitions of the sequence 


0,1,2,3,...,f-—1. 


Since gcd(p,q) = 1, the latter two sequences coincide only at the first term, 0 
(they next coincide with a 0 at position st + 1). 

Thus, there is a bijection between each remainder on division by st and 
the pair of remainders on division by s and by f, respectively. In terms of 
congruence classes, this says that 


wW([a]s:) = (a]s,[a]+) is a bijection. 


In fact, the bijection w is a ring isomorphism, from the ring Z/stZ of classes 
[a]s- to the ring (Z/sZ) x (Z/tZ) of ordered pairs ([a]s,[a]+), in which the 
sum and product are computed mod s on the first coordinate and mod ¢ on the 
second coordinate. 

Here is the calculation showing that w preserves products: 


W ({aa’]s1) = ({aa’]s, [aa’];) by definition of w 
= ({a]s{a’]s,[a];[a']:) by definition of mod s and mod t product 
= ({a]s,la]+) - (a’]s, [a’],) by definition of product of pairs 
= Wlalss) - W(La'Ist) by definition of wy. 


The calculation for sums is exactly the same, with + in place of the - sign. 
Now consider the subgroup (Z/stZ)* of the ring Z/stZ, consisting of the 
invertible elements [a];; under the product operation. The isomorphism w 
sends these elements to the invertible pairs ([a]s,[a];), and these are just the 
pairs where [a]; is in (Z/sZ)* and [a], is in (Z/tZ)*. 
Thus, (Z/stZ)* is isomorphic to (Z/sZ)* x (Z/tZ)* Oo 
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Exercises 


1. What is the group (Z/6Z)*? 
2. Show that (Z/2pZ)”* is a cyclic group when p is prime. 


The number of invertible elements, mod k, is called g(k), and g is known 
as the Euler phi function. It is also the number of congruence classes [a], for 
which gcd(a,k) = 1, as we observed above. A fundamental property of the g 
function follows from the Chinese remainder theorem. 


3. Deduce from the Chinese remainder theorem that if gcd(s,t) = 1, then 
p(st) = g(s)p(t). 

4. It follows from exercise 3 that p(m) may be calculated from a prime 
factorization of m. Thus, it suffices to know gy(p”) for each prime p. Show 
that y(p") = (p — 1)p""!. 


3.7 Additive Notation for Abelian Groups 


In the last two sections we have given two historical examples of Abelian 
groups whose group operation is naturally viewed as “multiplication.” We will 
later see many Abelian groups whose group operation is naturally viewed as 
addition. In fact, the oldest example of all — the integers — is another whose 
group operation is addition, not multiplication (because most integers have no 
multiplicative inverse, but they all have an additive inverse). Therefore, it is 
important to be aware of additive notation for Abelian groups, since this will 
often be the natural notation to use. 

In additive notation, the group operation is written +, the identity element is 
written 0, and the inverse of element g is written — g, so the defining properties 
or Abelian group axioms from Section 3.5 become: 


81 + (82 + 83) = (81 + 82) + 83 (associativity) 
i+ g2=92+ 21 (commutativity) 
gt+0=¢g (identity) 
g+(—g) =0. (inverse) 


Since all of these properties hold for integer addition, it is now obvious 
why Z is an Abelian group under addition. Other examples are addition of 
congruence classes mod n, for any nonzero integer n. If [g] denotes the 
congruence of integer g, mod n, then the set Z, of these congruence classes 
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inherits the Abelian group properties of addition from Z. We saw this for the 
commutative property in Section 1.7, and it is similar for the others. 

Alongside additive notation for individual groups, we have the notation for 
direct sum of two groups G and H, G @ H, in place of the direct product 
notation G x H. Since we write members of G x H as ordered pairs (g,h), 
we can also write members of G @ H as ordered pairs (g,h), but “‘add 
componentwise,” 


(g1,/1) + (g2,h2) = (91 + g2,h1 + ha), 


when + is the group operation in G and H. An alternative (common when 
working with vectors) is to write (g,h) as gi + hj, then add the i and j 
components separately. 

A variation of this idea that everyone knows is the notation a + bi for 
complex numbers. Here a and b are real numbers, and the sum of two complex 
numbers is defined by 


(a, + byi) + (a2 + bai) = (ay + a2) + (D1 + ba)i. 


This shows that the Abelian group of complex numbers C, under addition, is 
the same as the direct sum R @ R of the real numbers with itself. 


Exercises 


An Abelian group that should be viewed both multiplicatively and additively 
came up in the exercises to Section 1.5. There we studied solutions x = a, 
y = b of the equation x* — 2y” = 1 through their “proxies” a + b V2 and (for 
a> 0) log (a + b4/2). 


1. Show that the numbers a + b V2, where a and b are integers and 
a? — 2b? = 1, form an Abelian group under multiplication. 

2. Show that the subset of these numbers for which a > 0 is also an 
Abelian group. 

3. Show that numbers log (a +b “/2), where a > 0 and b are integers such 
that a? — 2b* = 1, form an Abelian group under addition. 


3.8 Discussion 


The topic of quadratic forms sketched in this chapter was an important stepping 
stone from classical arithmetic to algebraic number theory and to some 
concepts of commutative algebra, such as Abelian groups and the discriminant. 
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However, we view it mainly as a topic of historical interest, which motivates 
later developments but is superseded by them. In particular, in the rest of 
the book the concept of quadratic form itself will be replaced by the more 
enlightening concept of quadratic integer. 


3.8.1 Quadratic Forms and Quadratic Integers 


Quadratic forms have played an important role in the development of algebra. 
On the one hand, they had been studied since ancient times in quadratic 
Diophantine equations, such as the Pythagorean equation and the Pell equation, 
so they are of abiding interest. On the other hand, they have problematic 
behavior: there are inequivalent forms with the same discriminant, which leads 
to failure of unique prime factorization in the corresponding quadratic integers. 
This created a demand for some replacement for unique prime factorization. 
Kummer expressed the demand, or hope, as follows: 


It is greatly to be lamented that this virtue of the real numbers [that is, ordinary 
integers] to be decomposable into prime factors, always the same ones for a given 
number, does not also belong to the complex numbers [that is, algebraic integers]; 
were this the case, the whole theory, which is still laboring under such difficulties, 
could easily be brought to its conclusion. For this reason, the complex numbers we 
have been considering seem imperfect, and one may well ask whether one could 
not look for another kind which would preserve the analogy with the real numbers 
with respect to such a fundamental property. 

(Translation by Weil (1975) from Kummer (1844)). 


Dedekind felt the loss of unique prime factorization as keenly as Kummer, 
so he had the utmost admiration for Kummer’s way out of the difficulty. 


But the more hopeless one feels about the prospects of later research on such 
numerical domains [of algebraic integers], the more one has to admire the steadfast 
efforts of Kummer, which were finally rewarded by a truly great and fruitful 
discovery. 

(Translation in (Dedekind, 1996, p. 56)). 


Kummer’s way out was the introduction of what he called “ideal numbers,” but 
he did not define what these “numbers” were. Dedekind, inspired by Kummer, 
gave the “ideal numbers” a concrete existence as what he called ideals. Today, 
ideals are the first topic one meets in ring theory, though usually without 
learning where they came from. 

Ideals had a long and difficult birth. As we have seen, mathematicians were 
reluctant to give up quadratic forms and accept quadratic integers. In fact, the 
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Figure 3.6 Ernst Eduard Kummer (1810-1893). Courtesy of the Mathematische 
Gesellschaft in Hamburg. 


move to quadratic integers was achieved only with Dedekind’s construction of 
a general theory of algebraic integers of arbitrary degree. 


3.8.2 From Quadratic Forms to Algebraic Integers 


The problematic behavior of quadratic forms was initially met by complicated 
and ingenious, but still traditional, algebra. First there was Lagrange’s method 
of reduction, which had some success with forms of negative discriminant. 
Then there was Gauss’s theory of composition of forms, in Gauss (1801), 
which could handle forms of arbitrary discriminant but was formidably 
complicated. Gauss’s theory, which fills over 200 pages in his Disquisitiones 
Arithmeticae, was impenetrable to most mathematicians and the Disquisi- 
tiones gained the reputation as a “book with seven seals.” 

After decades of work, Dirichlet (1863) produced a simplified, and certainly 
more readable, version of Gauss’s theory. But composition of forms was still 
a mystery. As we have seen, the Abelian group structure of composition 
was not brought to light until Kronecker (1870) made it explicit and proved 
the fundamental theorem of finite Abelian groups. Then in 1871 Dedekind 
produced a new edition of Dirichlet’s number theory, with supplements he had 
written himself. These included the first version of his theory of ideals, though 
Dedekind did not yet know what to call it. He included it his Supplement X, 
called “On the Composition of Binary Quadratic Forms.” 
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In the 1871 edition, only the first 43 pages of Supplement X ($§145-158) 
are about composition of quadratic forms. The next 75 pages (§§ 159-170) turn 
toward algebraic number fields, algebraic integers, and ideals. In Dedekind’s 
next edition of Dirichlet, in 1879, the ideal theory is no longer buried in “On 
the Composition of Binary Quadratic Forms.” It gets its own Supplement XI, 
called “On the Theory of Algebraic Integers,’ which now fills §§159-181 
(nearly 200 pages). The old §§145-158 remain the same. Indeed, it seems 
likely that Dedekind kept §§145-158 mainly for the record, since $181 of the 
new Supplement on algebraic integers explains composition of quadratic forms 
in the language of quadratic integers (and in 17 pages). 

In 1894 Dedekind wrote a third version, with §§145-158 still unchanged, 
but expanding “On the Theory of Algebraic Integers” to over 220 pages. Again, 
the final § of Supplement XI is the one from 1879 on composition of quadratic 
forms, slightly revised. It seems clear that, by this stage, Dedekind considered 
composition of forms to be a branch of the theory of algebraic integers, and of 
the related theory of rings and ideals. 

In our next two chapters we will see how algebraic integers found a place in 
the theory of fields and rings, and how the problematic behavior of quadratic 
forms brought to light the concept of ideal. Thus, the problems arising from 
quadratic forms were eventually resolved by abstraction: choosing the right 
abstract concepts to make the problems clear. Nevertheless, at each stage, the 
choice of abstract concepts was guided by concrete examples from the world 
of algebraic integers. 

It is noteworthy that Emmy Noether, who is responsible more than anyone 
else for the abstract theory of rings and ideals, urged her students to read all 
three versions of Dedekind’s Supplements. 
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Preview 


In previous chapters we have seen that the basic concepts of addition and 
multiplication on the ordinary integers quickly lead to difficult problems, even 
in finding integer solutions to quadratic and cubic equations. 

At the same time, we have seen that the basic properties of addition and 
multiplication can be usefully encapsulated in the algebraic structures of rings 
and fields. These structures provide a setting for the generalized “integers” 
that illuminate the ordinary integers, and for the polynomials that define 
generalized integers. 

We have also seen that, in a generalized integer setting, the property of 
unique prime factorization may fail. This creates a serious obstacle to the use 
of generalized integers, since success often depends on imitating arguments 
that are valid for ordinary integers. 

Overcoming this obstacle requires a general investigation of rings and 
fields, with the ultimate aim of finding a substitute — ideal prime factoriza- 
tion — that succeeds in cases where ordinary prime factorization fails. The 
models for this investigation are the algebraic number fields and the rings of 
algebraic integers they contain. 

The main aim of this chapter is to investigate algebraic number fields and to 
describe them as simply as possible. Once this is done, we define the integers 
of an algebraic number field EF and prove that they form a ring Ze. The 
properties of Zr unfold alongside a deeper investigation of rings that begins in 
the next chapter. 


78 
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4.1 Integers and Fractions 


As we Said in Sections 1.6 and 1.7, the system Z of ordinary integers satisfies 
the ring axioms 


a+b=b+a ab = ba (commutative laws) 
a+(b+c)=(a+b)+c a(bc) = (ab)c (associative laws) 
a+(-a)=0 (inverse law) 
a+0O=a a-l=a (identity laws) 
a(b+c) =ab+ac (distributive law) 


and the system Q of rational numbers satisfies the field axioms 


a+b=b+a ab = ba (commutative laws) 
a+(b+c)=(a+b)+c a(bc) = (ab)c (associative laws) 
a+(-—a)=0 aa~' =1 fora #0 (inverse law) 
a+0O=a a-l=a (identity laws) 
a(b+c) =ab+ac (distributive law) 


However, it is worth reviewing the step that takes us from Z to Q — the 
construction of fractions — because we tend to forget what an important (and 
complicated!) step it is. By recalling this step in detail, we will be able to see 
how it generalizes to a larger class of rings. 

The first thing to recall about fractions is that many fractions represent the 
same rational number. For example, 


1 —1 2 —2 3 —3 
a a ee a a 
In general, if a,b,a’,b’ € Z with b,b’ # 0, then 


a a 


ia y ~ ab! =a’'b. 
We may view a fraction a/b abstractly as just an ordered pair (a, b) of integers 
with b # 0, and call two pairs equivalent when they represent equal fractions. 
Then a rational number may be formally defined as an equivalence class of 


these pairs. Denoting equivalence by the symbol =, we have 
(a,b) = (a’,b') & abl =a’'b. (*) 


The relation = is an example of an equivalence relation because it has the 
properties of reflexivity, symmetry, and transitivity: 
(a,b) = (a,b) (reflexivity) 
(a,b) = (a',b') © (a’,b') = (ab) (symmetry) 
(a,b) = (a’,b’) and (a’,b’) = (a",b") = (a,b) = (a",b”) (transitivity) 
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The first two properties follow easily from the definition (*) and the corre- 
sponding properties of equality. To prove transitivity we need to prove the 
equivalent statement 


ab! =a'banda’b" =a"b’ = ab" =a'b. 


Well, 


ab’ =a'banda’b" =a"b’ 


= ab'b" = a'bb" and a'bb" = a" bb’ (multiplying first by b”, second by b) 


=> ab'b" = a"bb' (by transitivity of equality) 
=> 0 ab'b" —a"bb = bab" — al’) 

=> 0= ab" —a"b (because b’ # 0) 
= ab" =a"b, 


as required. In the penultimate line we have used the property that Z has no 
zero divisors: that is, if x # 0 and y # 0, then xy # 0. 

The “no zero divisors” property is also needed for the second important fact 
about fractions: that they have a well-defined sum and product. Both the sum 
and product of a/b and a’/b’ have the denominator bb’, so it is important that 
bb’ # 0 when b # O and b’ + 0. The definitions are 

a a ab'+a'b aa aad 
+a = a SS 
b Db bb’ b b’ bb’ 
Of course we all know that fractions behave like this, but we tend to forget how 
frustrating their behavior seemed at first encounter (especially the sum — why 
can’t the answer be ata ?). Writing the definitions in terms of ordered pairs 
may help to reawaken the difficulty: 


(a,b) + (a’,b’) = (ab! +. a'b, bb’) and (a,b) - (a’,b') = (aa’, bb’). (**) 


It is not obvious that these definitions are independent of the pair chosen to 
represent a given equivalence class, and to check independence we have to go 
through maneuvers like those above in the proof of transitivity. 

If (a’,b’) is replaced by (a”,b”) in the definition (**), then the pair for 
the sum, (ab’ + a'b, bb’), is replaced by (ab” + ab, bb”). According to the 
definition (*) of equivalence, the latter two pairs are equivalent just in case 


(ab! +.a'b)bb" = (ab" + ab) bb’. 
Subtracting abb’b” from both sides leaves 


a'b2b" = a b*b’, or b*(a'b” = ab’) = 0, 
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which means, since b ¥# 0 and there are no zero divisors, that 
a’b"” _ a’b’ —_ 0, or a'b"” _ a’ b’. 


This is precisely the condition for (a’,b’) to be equivalent to (a”,b”). There is 
a similar proof that the product of equivalence classes is well defined. 

Finally, we can prove from the definitions of sum and product that the 
equivalence classes of fractions form a field, Q. The proofs of the field 
properties are mainly routine applications of the definitions (**), combined 
with the known ring properties of Z. 

For example, to prove that addition in Q is commutative, consider 


(aby + (ab) = (ab! + a'b, bb’) (by definition of +) 
= (a’'b +ab’,b'b) (by commutativity in Z) 
= (a’,b’) + (a,b) (by definition of +) 


The other ring properties are proved similarly. Finally we have to prove that a 
nonzero (a,b) has a multiplicative inverse. This is true because, when a,b # 0, 


(a,b) - (b,a) = (ab,ab) = (1,1), 


so (b, a) is inverse to (a,b). We saw in Section 1.7 that inverses are unique. 


Exercises 


The idea that oye should be ata’ is false because ata’ fails to be independent 
of equivalence class representatives. 


ata has different values for different, but 


1. Give an example showing that 
equivalent, pairs (a’,b’). 


The relation a = b (mod n) introduced in Sections 1.6 and 1.7 is an 
equivalence relation. We glossed over that fact at the time, but here is a check. 


2. Explain why 
a =a (modn) 
a=b(modn) © b=a (modn) 
a = b(modn) and b =c (mod n) > a = c (mod n) 


This example also illustrates a property of equivalence classes, where the 
equivalence class of element a consists of all elements equivalent to a. Namely, 
any two equivalence classes are either identical or disjoint. For the relation of 
congruence mod n the equivalence classes are the congruence classes [a]. 


3. Prove that if [a] # [b], then [a] and [b] have no element in common. 
4. Prove a corresponding result for any equivalence relation. 


https://doi.org/10.1017/97810090041 38.006 Published online by Cambridge University Press 


82 4 Rings and Fields 


4.2 Domains and Fields of Fractions 


We gave the painfully detailed treatment of fractions above because exactly the 
same definitions and proofs apply in any ring with no zero divisors. 


Definitions. A domain! D is a ring with no zero divisors; that is, if a,b € D 
are nonzero, then so is ab. The fractions of D are the ordered pairs (a,b), with 
b # 0 and sum and product defined by 


(a,b) + (a',b') = (ab' +. a'b, bb’) and (a,b) - (a’,b') = (aa', bb’). (**) 
Their equivalence classes form the field of fractions of D, Frac(D). 


Any field F is an example of a domain, because if a,b € F with ab = 0 and 
b #0, then 0 = abb~! =a. It is also clear that F is its own field of fractions. 

Other examples, which will be particularly important in this book, are rings 
of algebraic integers. We will give a general definition of these rings later, 
but a good example is the ring Z[V2] = {a+ bV2;a,b € Z} introduced in 
Section 1.6. 

Z[V2] has no zero divisors, because its elements are all real numbers, and 
the real numbers include no zero divisors. Thus, we can form fractions, which 
we will write in the usual way. The field of fractions of Z[V2] includes all the 
quotients of a+b afd. where a,b € Z, by c € Z, which are all the numbers of 
the form 


pt+q 2 where pqeQ (*) 


Conversely, any quotient of members of Z[V2] is of this form. To see why, it 
suffices to check that multiplicative inverses of members of Z[V2] are of the 
form (*), because the product of numbers of the form (*) is obviously of the 
same form. As for the inverse, observe that 


a a-—b/2 
at+bV2. (a—bV2)(a+bV2) 
_a-byv2 a b V2 


@—2b2 a@—2b a2—2p2 ~~ 
which is of the form (*). Thus the field of fractions of Z[ V2] is the field of 
numbers of the form (*), which we call Q(v2). (The round brackets indicate 
that we allow the formation of quotients as well as sums, differences, and 
products.) 


: Many algebra books call these rings integral domains, but the adjective “integral” occurs 
elsewhere in ring theory, as we will see, so it is best to avoid it here. 


https://doi.org/10.1017/97810090041 38.006 Published online by Cambridge University Press 


4.3 Polynomial Rings 83 


But there are many examples of rings that do have zero divisors, and hence 
cannot be extended to fields. An example is the ring of congruence classes 
mod 6, which came up in Section 1.7. In the congruence classes mod 6 we 
have [2] # [0] and [3] # [0] but [2][3] = [6] = [0]. 


Exercises 


The fraction field of Z[V2] illustrates a property we will later observe for 
other fields F of algebraic numbers: Its elements are not merely quotients of 
algebraic integers, but quotients of the special form: 


integer of F 
ordinary integer’ 


1. Ifa,b,c,d € Z, write 2t2¥2 in the form +2, where I,m,n € Z. 
ct+d /2 n 


2. Prove a similar result for quotients of members of Z[ V3]. 
3. Find TEESE showing that it is actually in Z[2'/3]. 


4.3 Polynomial Rings 


Since its emergence as a separate mathematical discipline in the 16th century, 
algebra has involved polynomials — objects obtained from numbers and 
“unknowns” (denoted by x, y,z, ... and other letters) by addition, subtraction, 
and multiplication. From the beginning it was recognized that polynomials 
obey the same rules of calculation as numbers; in other words, they have the 
ring properties, as we observed in Section 1.6. 

Now we single out certain collections of polynomials, called polynomial 
rings, for special attention. Among them are: 


1. Z[x], which consists of the polynomials in one unknown, x, with integer 
coefficients. 

2. Q{x], which consists of the polynomials in one unknown, x, with rational 
coefficients. 

3. R[x], which consists of the polynomials in one unknown, x, with 
coefficients from a ring R. Or F[x], which consists of the polynomials in 
one unknown, x, with coefficients from a field F. 


In each case it is clear that the sum, difference, and product of two members of 
the ring is another member of the ring, and that the ring properties are satisfied. 
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Thus, each R[x], where R is a ring, is itself a ring. A typical polynomial in 
R[x] takes the form 


p(X) = Gyx” + dn_1x" | 4+---+.a,x +49, where ap,aj,...,dn € R. 


Without loss of generality we can assume a, # 0, in which case p(x) is said to 
have degree n. (Unless n = 0, in which case we can have p(x) = ao, which is 
of degree 0 if ag # 0 and by convention degree —oo if ag = 0.) 

Since aring R gives aring Ry = R[x;], we can use R in turn to form the 
ring Ro = Ry [x2] = R[x1,x2], which consists of the two-variable polynomials 
with coefficients in R. Continuing in this way, we see that the n-variable 
polynomials with coefficients in R form a ring R[x1,x2,...,Xn]. 

For suitable rings R, the polynomial ring R[x] resembles Z in more than 
just the ring properties. If R is a domain, then R[x] is also a domain, so it 
has a field of fractions, denoted by R(x). If R is a field F, F[x] also has a 
Euclidean algorithm and unique prime factorization. We observed this in the 
case of F = Qin Section 1.6, and the proof is exactly the same for coefficients 
in any field, since it involves only division and subtraction. 

However, this is a good place to note that the gcd of two polynomials in 
F[x] does not change when we allow divisors from F’[x], for some field 
F’ D F. Such divisors cannot arise, since the steps in the Euclidean algorithm 
do not produce any coefficients outside F’. In particular, if we know the gcd by 
some other means — such as factorization in a larger F’[x] — then we know that 
its coefficients are in F’. This observation is used in Section 6.5. 


4.3.1 Congruence Modulo a Polynomial 


The analogy between Z and F[x] goes even further. As we saw in Section 
1.6, the step between the Euclidean algorithm and unique prime factorization 
involves an expression for the gcd of two polynomials a(x) and b(x): 


gcd(a(x), b(x)) = a(x)m(x) + b(x)n(), 
for some polynomials m(x) and n(x). 
This too holds in F [x] for any field F’, because the division property in F [x] 
(and hence the Euclidean algorithm) holds by exactly the argument given in 


Section 1.6. Another consequence of this form of the gcd is the existence of 
inverses modulo a prime polynomial p(x). 


Definitions. A nonconstant polynomial p(x) € R[x] is prime or irreducible 
over R if it is not the product of lower-degree polynomials in R[x]. Polynomi- 
als s(x) and t(x) in R[x] are congruent mod p(x), written as 
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S(x) =t(x) (mod p(x)), 


if p(x) divides s(x) — t(x). The set of polynomials congruent to s(x), mod 
p(x), is called the congruence class of s(x), written as [s(x)]. 


Existence of inverses. [f F is a field and a(x) is not a multiple of the 
irreducible p(x), then there is a polynomial m(x) € F(x] such that 


a(x)m(x) = 1 (mod p(x)). 


Proof. It follows from the Euclidean algorithm that, for any a(x), b(x) € F[x] 
there are m(x),n(x) € F[x] such that 


gcd(a(x), b(x)) = a(x)m(x) + b(x)n(x). 
In particular, if a(x) is not a multiple of p(x), 
1 = gcd(a(x), p(x)) = a(x)m(x) + p(x)n(x). 


Thus, p(x) divides a(x)m(x) — 1, which means 


a(x)m(x) = | (mod p(x)). a 
Corollary. The congruence classes mod an irreducible p(x) form a field. 


Proof. The last congruence says that [a(x)][m(x)] = [1], for any a(x) not a 
multiple of p(x); that is, for any nonzero congruence class [a(x)]. So each 
nonzero class [a(x)] has a multiplicative inverse, [m(x)]. The other field 
properties of congruence classes (namely, the ring properties) are inherited 
from the ring properties of F[x], exactly as for congruence classes in Z 
(Section 1.7). oO 


Exercises 


It should be noted that the Euclidean algorithm for polynomials — like the one 
for integers but using division with remainder rather than subtraction — extends 
to an algorithm that computes the polynomials m(x) and n(x) such that 


gcd(a(x), b(x)) = a(x)m(x) + b(x)n(x). 
We illustrate with an example from the exercises to Section 1.6. 
1. Show 


gcd(x? — 2,x? — 1) = ged(x? — 1,x — 2) = ged(x — 2,3) = 1, 
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via the following divisions with remainder: 
x9 —-2=(x7-1)x+x-2 (dividing x? — 2 by x? — 1) 
x*-1=(*—-2)(x+2)43 (dividing x* — 1 by x — 2) 


2. Deduce, by substituting x — 2 = x? — 2 — (x? — 1)x from the first 
equation into the second, that 


1 = (x? —2)m(x) + (x? — I)n(x), 


where m(x) = —(x + 2)/3 and n(x) = (1+ x)*/3. 

3. Conclude that (1 + x)? /3 is the inverse of x? — 1, modulo x? — 2. 

4. Explain why x? — 2 is irreducible over Q, but show that it factorizes in 
R[x]. 


4.4 Algebraic Number Fields 


In the previous section the field of congruence classes of polynomials in F'[x], 
modulo an irreducible polynomial p(x), was motivated partly by its analogy 
with the congruence classes of ordinary integers modulo a prime p. In both 
cases everything follows from the elementary idea of division with remainder. 

In the case where F = Q there is a very natural interpretation of this field 
of congruence classes: as an algebraic number field. 


Definitions. An algebraic number « is one? that satisfies an equation of the 
form 


14 .++4+a1x + a9 =0, 


where dg, @1,...,4, € Qanda, # 0. 


Anx" a Qn—1x"~ 


Minimal polynomial. /f a is an algebraic number then there is a unique 
polynomial, of minimal degree, that is satisfied by a and of the form 


p(x) =x" +a,_ix" "| +---+a;x +ay9 =0, where ao,aj,...,a,-1 €Q 


Also, any polynomial satisfied by a is a multiple of p(x). 


2 As is well known, the fundamental theorem of algebra (FTA) implies that any algebraic 
number lies in the field C of complex numbers. We will assume FTA later in this book. 
However, for our present purposes, it will suffice that the number is defined by a polynomial 
equation. Indeed, the existence of a solution to a polynomial equation will emerge algebraically 
from the ring Q[x] without appeal to analysis, unlike proofs of the fundamental theorem of 
algebra. 
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Proof. Clearly there is a p(x) of minimal degree satisfied by a. And we can get 
p(x) in the above form by dividing through by the coefficient of the highest- 
degree term. Also, p(x) is unique, because if there were a different minimal 
polynomial p*(x), also satisfied by a, then a would satisfy p(x) — p*(x), 
which has lower degree. 

Finally, suppose that s(x) is any polynomial satisfied by a. Then s(x) 
necessarily has degree > that of p(x), so division with remainder by p(x) 
gives 


S(x) = q(x) p(x) + r(x), with degree r(x) < degree p(x). 


It follows that a also satisfies r(x), and hence r(x) = 0 by minimality. Thus, 
any polynomial s(x) satisfied by @ is a multiple of p(x). oO 


Corollary. The minimal polynomial is irreducible over Q. 


Proof. If p(x) is a product g(x)r(x) of polynomials in Q[x] of lower degree, 
then a satisfies one of g(x) = 0 or r(x) = 0, contrary to minimality. oO 


Now, despite the fact that we know nothing about the solutions of the 
equation p(x) = O, we can construct a field in which this equation has a 
solution simply by saying “let p(x) = 0.” This is the magic of algebra. More 
formally, we consider the congruence classes of Q[x] modulo p(x). As we 
saw in the previous section, these congruence classes form a field in which 
(obviously) p(x) = 0. Moreover, this field is isomorphic to the fraction field 
Qa) of Q[@], for any number o@ that satisfies p(x) = 0. 


Definition. A field F is isomorphic to a field F’, written F = F’, if there is a 
one-to-one map y: F — F’, onto F’, such that 


VEB+W=VAM+Vvy), Wb-v) =v): vy). 


Such a map jf is called a field isomorphism. 


Solution field construction. The field of congruence classes of Q(x] modulo 
an irreducible polynomial p(x) is isomorphic to Q{a], for any solution a of 
the equation p(x) = 0. In particular, Q{a] is a field, and hence equal to Q(a). 


Proof. Since p(x) is irreducible, its congruence classes form a field by the 
corollary in the previous section. In this field, the congruence class [x] 
is a solution of p(x) = 0. In fact, the correspondence [x] a@ defines an 
isomorphism from the congruence classes mod p(x) to Q(@), which is equal 
to its own fraction field. We unfold these facts in the following steps. 
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1. The map [x] @ extends to a map 
w: {congruence classes mod p(x)} > Q[a] 


by setting w([s(x)]) = s(@) for any polynomial s(x) € Q[x]. This map is 
well defined because 


[s(x)] = [t(x)] > p(x) divides s(x) — t(x) > s(a) = t(@). 
2. The map w preserves sums and products because 


W([s@x)] + [¢Q)) = W(s(x) + t(x)]) (by congruence class sum) 
s(a) + t(@) (by definition of w) 
= W([s(x)]) + Wt @)) (by definition of y) 


Similarly for products — just replace the + sign with the - sign. 

3. The map w: {congruence classes mod p(x)} > Q[a] is onto 
Q[a], because any s(a~) € Q[a] equals yw ([s(x)]). Thus, y is an 
isomorphism onto Q[a@] and, since the congruence classes form a field, so 
does Q[a@]. (Alternatively, we can see that Q[a] includes the inverse of 
each nonzero s(a@); namely t(a), where [f(x)] is the inverse of the nonzero 
congruence class [s(x)].) Oo 


The field Q(a) = Qi v2] is a good illustration of this theorem. We already 
know that the domain 


Q[ V2] = {a+ bvV2;a,b€Q) 


equals its own fraction field. The theorem shows that [x] bh /2 defines an 
isomorphism from the congruence classes of Q[x] modulo x* — 2 to Q[ v2]. In 
fact there are two such isomorphisms: y+ sending [x] to /2 and w~ sending 
[x] to — ./2, since both J/2 and — V2 are roots of x2 —2 = 0. 


Notation. The field of congruence classes of Q[x] modulo an irreducible poly- 
nomial p(x) is denoted by Q[x]/(p(x)). (This notation is an instance of a more 
general notation — for the quotient of a ring by a principal ideal — that will be 
explained in Section 5.3. We already used a simple case of it in Section 3.6.) 


4.4.1 Conjugates 


Different solutions a and a’ of the same equation p(x) = 0, where p(x) is an 
irreducible polynomial, are called conjugates of each other. For example, /2 
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and — /2 are conjugate solutions of the equation x? — 2 = 0. We have already 
used special cases of this terminology in Chapters 1, 2, and 3. 

The above proof that Q(q@) is isomorphic to the field Q[x]/(p(x)) of 
congruence classes of Q[x] modulo p(x) applies equally well to Q(a’), 
since a’ also satisfies the equation p(x) = 0. Thus both Q(@) and Q(a’) are 
isomorphic to the field of congruence classes, and hence they are isomorphic 
to each other. 

Indeed, the proof shows that if a1, ...,@, are the roots of p(x) = 0, then 
the map oj: a1 t [x] b a; extends to an isomorphism o;: Q[a,] > Q[a;]. 
Thus, when the minimal polynomial has n roots, there are n isomorphisms 
[x] +> a; of the field of congruence classes into a field containing all the roots 
of p(x) (such as C, by the fundamental theorem of algebra) . 


Exercises 


The previous sections show that it is fairly easy to handle algebraic numbers 
arising from quadratic equations, such as a + b/2 where a,b € Q. For 
example, it is quite easy to find the inverse of a + b ./2. The results of the 
present section allow us, in principle, to find inverses of algebraic numbers 
of arbitrary degree, by finding inverses of polynomials with the help of the 
Euclidean algorithm, as in the previous exercise set. Here is an example arising 
from a cubic equation. 


1. Explain why Q2!/3) = QLx]/(x3 — 2) 

2. Deduce that the inverse of 27/3 — 1 corresponds to the inverse of x? -1, 
modulo x? — 2. 

3. Hence calculate the inverse of 27/3 — 1 from exercise 4 of Section 1.6, 
expressing your answer in the form a + 2!/3b + 27/3¢ for some a,b,c € Q. 

4. Find the inverse of 27/3 + 1 in the form a + 2!/3b + 2?/3c for some 
a,b,c EQ 


4.5 Field Extensions 


When a field F is a subfield of a larger field E, we call FE an extension field of 
F,, and we sometimes write F C E or E D F with the understanding that F 
is not merely a subset of E but also has the same sums and products. The 
properties of an extension field E > F are often found from knowledge of the 
properties of F together with knowledge of the properties of E “relative to” or 
(as algebraists say) over F’. 
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An example of an extension is the field Q(v2) > Q Since V2 is irrational, 
Q(v2) is a proper extension of Q. By writing 


Q(v2) — {a+bv2: abe Q} 


we can see one of the important “relative” properties of Q(V2). It has dimen- 
sion 2 over Q in the sense of linear algebra; namely: 


e Every element of Q(V2) is a linear combination, a + b 2, of the two 
elements 1, wf 2: 

- The latter two elements are linearly independent over Q. That is, if the 
linear combination a + b /2 = 0 with a,b € Q, thena = b = 0 (otherwise 
we would have /2 = —a /b, which is rational). 


Definitions. If E > F is a field extension, then a basis for E over F is a set 
of elements e1,...,e, of E such that 


e The basis elements span E over F’;; that is, each e € E has the form 
e= fier t+---+ frex forsome fi,..., fe € F. 
« The basis elements are linearly independent over F’; that is, if 
O= fier +---+ frex for some fi,..., fr € F, 
then fj =--- = fx =0. 
E is said to be of finite dimension over F if it has a finite basis over F. 


The elements of a field extension E > F of finite dimension are called 
algebraic over F because of the theorem below. We also say that E itself 
is algebraic over F’, or is an algebraic extension of F'. The proof assumes the 
fact (hopefully familiar, but see Chapter 6 for a proof) that if E has dimension 
n over F, then any set of n + 1 elements of E is linearly dependent over F. 


Finite-dimensional extensions. /f E is of finite dimension n over F, then each 
e € E satisfies an equation of the form 


x™ + fin x! +o + fix + fo = 0, 
where fo, fi,.--, fm—1 € F andm <n. 


Proof. If e € E, then l,e,...,e” aren + 1 elements of F, and hence they are 
linearly dependent over F’. That is, they satisfy an equation 


fre’ + frie” 1 +--+ fret fo =0, 
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where fo, fi,.--, fn € F are not all zero. Obviously, fo cannot be the only 
nonzero coefficient. If e” is the largest power with nonzero coefficient, fin, 
then we can divide through by f,,, and obtain an equation for e of the form 


x™ + fine! foe + fix + fo = 0, 


where fo, fi,.--, fm—1 € F andm <n. = 


Corollary. If F > Q is an extension of dimension n, then all members of F 
are algebraic numbers of degree < n. 


4.5.1 The Extension Q(a) of Q 


The most common field extension is the field Q(@) obtained from Q by 
adjoining an algebraic number aq. Intuitively, this field consists of all numbers 
obtainable from @ and rational numbers by the operations of +, — , x, and +. 
More formally, if p(x) is the minimal polynomial for a, then Q(q@) is the field 
we called Q[x]/(p(x)) in the previous section. 

The formal construction has the advantage of showing us a basis for Q(q@). 


Basis for Q(a). If the minimal polynomial p(x) for a has degree n, then the 
elements 1,a,07,...,a@"—! form a basis for Q(a). 


Proof. The elements 1,a,a7,...,a@”—! of Q(w) correspond to the congruence 


classes 1, [x], [x]*, ...,[x]’~! of Q[x] modulo p(x), so it suffices to show that 
these congruence classes span and are linearly independent over Q. 
To show that they span, suppose that 


p(x) =x" + an_1x" | +--»+ayx+ao, where ao,a1,...,an—1 € Q. 


It follows that 


x" = —a,_;x""!—...—ayx — ao, 


so the class [x]” is a linear combination of the classes 1, [x], [x]*,...,[x]"7!. 
Next, if we multiply the later equation by x, we get 


7! = a, x = +s — ax? — ax, 


showing that the class [x ]” +! is a linear combination of the class [x]” and the 


classes 1, [x], [x]?,...,[x]’~!, so it is a combination of 1, [x], [x], ...,[x]"7! 
alone. Further multiplication by x shows that the same is true of (ett ee, 
and so on. 

Thus any polynomial in [x] is a linear combination of 1, [x], [Mircea 
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To show that 1,[x],[x],...,[x]"~! are linearly independent, suppose on 
the contrary that 


bn-1[x]"! +--+ + bile] + bo = 0, 
for some bo, bj, ...,bn—-1 € Q, not all zero. 


This means that a satisfies a polynomial of degree < n — 1, contrary to the 
minimality of p(x). Oo 


This theorem is not as specialized as it looks, because in fact any extension 
field E > Q of finite dimension is of the form Q(a@). The number a is called a 
primitive element for £, and we prove its existence in Section 6.5. 


Exercises 


The idea of a basis for one field EF over another field F will be explored more 
thoroughly in Chapter 6, particularly in Section 6.2. However, we can prepare 
for the general discussion by looking at the example of F = Q(v2) and its 
extension E obtained by adjoining the element /3. 


1. Show that /3 ¢ Q(V2) by supposing 
V3=a +bV/2, where a,b € Q, 
and finding a rational expression for V2. 


Now any field containing F = Q(V2) and ./3 includes, at a minimum, the 
numbers a + 6 V3 for a, B € Q(Vv2). It is also clear that the sum and product 
of such numbers is another number of the same form. 


1 ° 
2. Show that ew is also of the form y + 6 /3, where y,6 € F, and hence 


that E = {a + B V3: a, B € F} is the smallest field containing F and J3. 
3. Explain why 1 and V3 form a basis for E over F. 


Finally, let us switch to viewing E as a field over Q. 


4. Show that the numbers a + 6 V3 for a, 6 € F are precisely the numbers 
a+bJ/2+cV3+4+d V6 for a,b,c,d €Q. 
5. By writing a+ b/2+cV/3+d V6as 


(a+bV2)1 
+ (c+d v2) V3 


and using exercise 3, show that a+ b /2 +c /3 +d V6 = O if and only 
ifa=b=c=d=0. 
6. Conclude that 1, es V3, 6 is a basis for E over Q. 
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In the exercises to Section 6.5 we will investigate the field E further, showing 
that /2 + J/3 is a primitive element for E and finding its minimal polynomial. 
Since these results require no further theory, the reader may wish to attempt 
them now. 


4.6 The Integers of an Algebraic Number Field 


The preceding sections have shown that fields are somewhat easier to handle 
than rings. In particular, they afford a nice agreement between the concepts of 
“dimension” and “degree.” In Chapter 6 we will view this agreement in the 
setting of vector spaces, which are based on fields. 

Since we have a clear and simple concept of algebraic number field — 
namely, a finite-dimensional extension of Q — it seems a good idea to place 
rings of algebraic integers inside algebraic number fields. 


Definitions. An algebraic integer is a solution of an equation 
x” + ay_yx" 1 +.---+ a,x +a9 =0, where ao, ...,a,_1 € Z. 


The algebraic integers in an algebraic number field E are called the integers 
of the field E. 


As an example, it follows from these definitions that the roots of x? — 1 = 0 
are algebraic integers. In fact they are integers of the field Q(V —3), since 


e-1l=(e-1(x?4+x41) 


and the roots of x7 + x + 1 = Oare as | Thus, perhaps surprisingly, the 
integers of the field Q(V-3) are not all of the form a + b./—3 with a,b € Z. 

Allowing roots of all the polynomials x” + Gn—1x"—! +--+ +a,x + ag (the 
so-called monic polynomials) to be “integers” begins to make sense when we 
observe that they behave as integers should, in the following sense. 


Sum, difference, and product of algebraic integers. [f a and 6 are alge- 
braic integers, then so area + B and af. 


Proof. Suppose that a satisfies an equation a + ay_ja*-! + --- + aya + 
ag = 0. This gives 


ak = thio” —+++— aja — ao, 


a = —ap_1a ++ — aja aoa, 
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k k-1 


and so on. Thus, a” is a linear combination of l,a,...,a@ with integer 
coefficients. It follows in turn that so too are a+! q*t2,... and hence every 
polynomial in a with integer coefficients. 

Similarly, if 6 satisfies a monic polynomial equation of degree /, then 
any polynomial in f is a linear combination of 1,f,...,6/~! with integer 
coefficients. Therefore, any polynomial in a and B is a linear combination 
of terms a! B/, withO <i < k—1and0 < j < 1-1, with integer coefficients. 

So, if we denote the k/ products at! Bs by @1,...,@k; we can write each 
polynomial w ina and 6 (such as a + 6 or a) in the form 


@=njoa,+---+ngywg Where ny,...,ng € Z. (*) 


From this we get k/ equations in the k/ “unknowns” ,, by multiplying (*) by 
@|,...,@gi and rewriting each right-hand side as a linear combination of the 
@m With integer coefficients: 


oo = na, + +--+ Nori 


ow, =n{o, +++ +ngon 


kl kl 
WOK] = n‘ Je ec nu eon. 
These are kl homogeneous equations in the kJ unknowns @,, with a 
nonzero solution, so their determinant? must be zero. That is, 


n* nt _ n® — w 


The determinant is a polynomial in w with integer coefficients, whose highest 
power w! has coefficient +1. Thus we get a monic polynomial equation for 
o, showing that w = a + B,a — B or af is an algebraic integer. Oo 


Corollary. The integers of an algebraic number field form a ring. 


Proof. If a and f are integers of an algebraic number field E, then a + 6, 
ap € E since E is a field. And these elements are algebraic integers by the 
theorem above. Also, the sums, differences, and products of integers in E 
inherit the ring properties from E. Oo 


3 In this chapter and the next, we assume some general properties of determinants. Readers 
probably know these properties from a linear algebra course, but we prove them from scratch in 
Chapter 7, so as to provide a single foundation for several results. 
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Notation. The ring of integers of a field E > Q of finite degree is called Zz. 


One reason for placing algebraic integer rings in algebraic number fields, 
and hence bounding their degree, is that if w is an algebraic integer, then so is 
./a. If we put no bound on degree, each algebraic integer has a “factorization” 


a= JaJSa. 


This rules out any chance of primes or prime factorization, which was the 
reason for using algebraic integers in the first place. Thus, we need to bound 
degree to have any hope of prime factorization. It is still not easy to achieve, as 
we will see, but there is hope in the concept of “ideal numbers,” which is the 
starting point of our next chapter. 


Exercises 


The simplest illustration of the concept of integer in an algebraic number field 
E is where EF = Q. If this concept of “integer” makes sense, then the ring of 
integers of Q should be Z. To confirm this, suppose that x € Q satisfies the 
minimal equation 


k-1 


x* + ap_ix +---+ajx+aj9=0, whereado,...,a, € Z (*) 


1. Suppose that x = m/n satisfies (*), where gcd(m,n) = | andn > 1, so 
x ¢ Z. Deduce that 


m= —ap_{n— +++ — ant - aon*, 
and explain why this is a contradiction. 


With the definition of algebraic integer we are now in a position to prove 
the result mentioned in the exercises to Section 4.2, that an algebraic number 
is the quotient of an algebraic integer by an ordinary integer (a result we will 
see again in Section 8.2). 


2. Suppose that 6 € E, an algebraic number field of finite degree, and that £ 
satisfies the polynomial 


Anx” + dmx" | +++»+ayx+ag=0, where ao,...,4m € Z. 


Then a = a,,B € E also. Why? 
3. By multiplying the above equation by a 
monic equation 


m—1 


m »eXplain why a satisfies the 


a dm ae ee tae” a+ apa = 0. 
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4. Deduce that a is an integer of the field E, and hence that each f ¢€ E is of 
the form (integer of E)/(ordinary integer). 
5. Explain why this shows that EF = Frac(Zz). 


Finally, let us consider why each algebraic integer has a factorization in the 
ring of all algebraic integers. 


6. If a is an algebraic integer, so is ./a. Why? Is </a also an algebraic 
integer? 


4.7 An Equivalent Definition of Algebraic Integer 


The definition of algebraic integer given in the previous section is suitable 
for most purposes in this book, but it has one drawback: it is hard to tell 
when a given algebraic number is not an algebraic integer. For example, the 
number a = /3/ 4/2 is an algebraic number that satisfies the minimal monic 
polynomial equation 


which does not have integer coefficients. This could mean that @ is not an 
algebraic integer. But how do we know there is no other monic polynomial 
equation for @ with integer coefficients? 

If there were such a polynomial, x7 — 3/2 would divide it by the minimal 
polynomial theorem in Section 4.4. This contradicts the following result, which 
comes from Gauss (1801), article 42. 


Gauss’s lemma. [fa monic polynomial f (x) € Z[x] factorizes into g(x)h(x), 
where g(x) and h(x) are monic polynomials in Q|x], then in fact g(x) and h(x) 
are in Z[x]. 


Proof. Following an idea of (Dedekind, 1871, p. 466), we introduce the 


content cont(F) of a polynomial F(x) € Z[x], which is the gcd of the 
coefficients of F(x). We first prove that if G(x), H(x) € Z[x], then 


cont(G AH) = cont(G)cont(#). 


By first dividing each of G(x) and H(x) by the gcd of their coefficients, 
we can suppose that cont(G) = cont(H) = 1, so it remains to proves that 
cont(GH) = 1. 

Suppose on the contrary that a prime number p divides cont(GH), so p 
divides each coefficient of G(x) H (x), where, say, 
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G(x) = byx"’ +--+ bx + bo, 
H(x) =csx* +--+ +e1x +c. 


Our assumption cont(G) = cont(H) = 1| means there is a least i such that p 
does not divide b; and a least j such that p does not divide c;. But then p does 
not divide the coefficient of x't/ in G(x) H(x), namely, 


boci4 7 Fb bicjpp jar ee Di-1e je Fe; + Bip iej-1 + ++ + i460, 


since p divides each term except bjcj;. This contradiction shows 
cont(GH) = 1. 

Now we return to the monic f(x) € Z[x] that equals g(x)h(x) for monic 
g(x), h(x) € Q[x]. We multiply each of g(x) and h(x) by an integer m so that 
both G(x) = mg(x) and H(x) = mh(x) are in Z[x]. Then, on the one hand, 


cont(GA) = cont(m? f (x)) = m?, because f(x) is monic and in Z[x]. 
On the other hand, by the multiplicative property of content just proved, 
m = cont(G)cont(H). 


Now cont(G) = cont(mg) < m because m = coefficient of x” in G(x), and 
similarly cont(H) = cont(mh) < m. Hence, in fact, cont(G) = cont(H) = m 
and so cont(g) = cont(h) = 1. That is, g(x), h(x) € Z[x]. oO 


Corollary. [fa satisfies a monic polynomial in Z[x], then the minimal monic 
polynomial g(x) for a in Q[x] is in Z[x]. 


Proof. If a satisfies the monic polynomial f(x) € Z[x] then the minimal 
monic polynomial g(x) for a in Q[x] divides f(x) by the minimal polynomial 
theorem of Section 4.4. Then g(x) € Z[x] by Gauss’s lemma. oO 


Thus, an equivalent to the definition of algebraic integer given in the 
previous section is the following: An algebraic integer is an algebraic number 
whose minimal polynomial has integer coefficients. 


Exercises 


1. Show that a is not an algebraic integer, but that is. 


Gauss’s lemma is the key to the most useful test for irreducibility of 
polynomials in Q[x], called the Eisenstein irreducibility criterion after its 
appearance in Eisenstein (1850) (though it was actually found by Schénemann 
a few years earlier). The criterion states that if 


l¢-v3 
v2 


f(x) =x" +ay_x" | +++» + ax +9 € Z[x] 
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Figure 4.1 Gotthold Eisenstein (1823-1852). Licensed under Creative Commons 
Attribution-ShareAlike 4.0 International License. 


and p is a prime that divides each a; but p- does not divide ao, then f (x) is 
irreducible in Q{x]. 


2. Suppose that f(x) satisfies the criterion for a prime p but that g(x) and 
h(x) are monic polynomials such that f(x) = g(x)h(x). Deduce that g(x) 
and h(x) are in Z[x]. 

3. Now let Rica? q(x), h(x) be the polynomials resulting from 
S (x), g(x), h(x), respectively, by replacing each coefficient by its 
congruence class mod p. Explain why f(x) = q(x)h(x). 

4. Also, f(x) = x” (why?). And unique factorization into irreducibles holds 
for polynomials with coefficients in Z/pZ (why?). Deduce that g(x) = x° 
and h(x) = x! for some positive integers s and ¢. 

5. Deduce from exercise 4 that all but the leading coefficients of g(x) and 
h(x) are divisible by p, so p? divides ag, which is a contradiction. 


A famous application of the Eisenstein criterion is to prove irreducibility of 
the cyclotomic’ polynomial f(x) = x?~! + x?-? +.--+ x +1, for prime 
values p. This polynomial is the nontrivial factor of x? — 1, whose factorization 
is a special case of the one in Section 1.8. 


4 The word comes from the Greek for “circle dividing,” and it is used here because the roots of 
the polynomial are p equally spaced points on the unit circle in the plane of complex numbers. 
We saw examples of circle division in the exercises to Section 2.3. 
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6. Explain why f(x) = “=! is irreducible if and only if 
f(y + 1) = GY! is irreducible. 

7. Use the binomial theorem to show that 

P(p = 1) yP-3 


forS yop, 


asic De 


Also explain why all coefficients except the first are divisible by p. 
8. Deduce, by Eisenstein’s criterion, that f(y + 1) is irreducible, and hence 


so is f(x). 


4.8 Discussion 


Do you believe in irrational numbers? It depends on how much you need them. 
Analysts need not just individual irrational numbers like 2 or zr, but also the 
infinite totality of real numbers, R, and the totality of complex numbers C, so 
most analysts have no difficulty believing in irrational numbers. 

In algebra the need for infinite objects is less, and some eminent algebraists, 
such as Kronecker, have tried to avoid them. Of course this is not entirely pos- 
sible, since the set N of natural numbers is infinite. But it is, as the Greeks used 
to say, only potentially infinite. One can avoid thinking of N as a completed 
whole by viewing it instead as an endless process, producing 0,1,2,3,... 
by repeatedly adding 1. Kronecker was willing to accept potentially infinite 
collections such as N but not actually infinite collections like R and C. 

This makes the so-called fundamental theorem of algebra (FTA) a 
tricky issue for algebraists. As it is normally stated, the FTA says that each 
polynomial equation p(x) = 0 has a root in C. So the very statement of the 
theorem assumes the existence of the actual infinity C. Also, this is the most 
convenient statement of FTA, because it admits an easy proof by standard 
theorems about continuous functions. But it is embarrassing that a supposed 
theorem of algebra is actually a theorem of analysis. 

Kronecker (1887) proposed to replace FTA by what he called the “funda- 
mental theorem of general arithmetic.” His idea has not been embraced by 
mathematicians in general, but it led to the simplest and best description of 
algebraic number fields, which we used in Section 4.4. Let us now revisit the 
FTA, in order to better appreciate Kronecker’s contribution. 


4.8.1 The Fundamental Theorem of Algebra 


The FTA was anticipated by mathematicians for centuries before it was 
actually proved. Hope for such a theorem was raised by the 16th-century 
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Figure 4.2 Niels Henrik Abel (1802-1829) and Evariste Galois (1811-1832). 
Public domain and courtesy of the Bibliotéque nationale de France, respectively. 


solution of cubic equations, mentioned in Section 1.9. This breakthrough was 
quickly followed by the solution of quartic (degree 4) equations by Cardano’s 
student Ferrari, so it seemed reasonable to expect similar, if more complicated, 
solutions for equations of degree 5, 6, and so on. 

What the 16th-century mathematicians could not know was that equations 
of degrees 2, 3, and 4 are special. They admit solutions by specific formulas 
such as the quadratic formula 


_ —b+ Vb* — 4ac 


x when ax? + bx +c=0, 
2a 
and the Cardano formula for the cubic given in Section 1.9. Formulas of this 
kind, involving the coefficients of the equation and the operations +, —, x ,+, 


and nth roots, are too simple to solve even the general equation of degree 5. 
This remarkable result was proved by Abel (1826a), and it was explained in 
terms of group theory by the famous theory of Galois a few years later. 

But before this happened, Gauss had realized that it was unnecessary to 
seek formulas for solving polynomial equations, and better merely to prove 
that solutions exist. Between 1799 and 1816 Gauss gave several proofs of FTA 
that depend on properties of continuous functions and hence, ultimately, on 
the properties of the infinite set R. The simplest property that suffices to prove 
FTA is the so-called intermediate value theorem, which states: if f(x) is 
continuous fora < x < band f(a) < 0 < f(b), then f(c) = 0 for some c 
between a and b. 

It follows from the intermediate value theorem that any odd-degree poly- 
nomial equation p(x) = 0 has a real root. This is because we can suppose 
without loss of generality that the leading term of p(x) is x7”"t!, in which 
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case p(a) < 0 for large negative a and p(b) > 0 for large positive b. So, since 
any polynomial is continuous, it follows from the intermediate value theorem 
that p(c) = 0 for some c between a and b. 

This argument is the nonalgebraic ingredient in the proof of FTA given 
by Gauss (1816). The remainder of the proof, while complicated, is a purely 
algebraic reduction of a general polynomial equation to one of odd degree 
(and quadratic equations, which of course we know how to solve). But the 
intermediate value theorem is not replacable by algebra—at least not the kind 
of algebra that Kronecker believed in. The existence of the intermediate value 
c depends on the completeness of R: the fact that R has no holes or gaps. 
Obvious though completeness may seem, it implies that R is an actual infinity. 

The problematic nature of R had been a headache for mathematicians ever 
since the Pythagoreans discovered that ./2 is irrational. The most subtle 
and difficult part of Euclid’s Elements, Book V, is about getting a round the 
difficulties posed by irrational quantities. But the depth of the problem did 
not become clear until Cantor (1874) proved that R is uncountable. That is, 
it is impossible to arrange the set of real numbers the way we can arrange 
the natural numbers (and also the rational numbers, and even the algebraic 
numbers) as 


first number, second number, third number, ... 


This discovery, which we discuss further in Section 5.7, showed that R cannot 
be merely a “potential” infinity. It is probably no coincidence that Kronecker 
was vehemently opposed to Cantor’s work, and that his campaign to avoid 
infinity began as Cantor’s discoveries became known. 


4.8.2 Algebraic Numbers 


The surprising subtlety of FTA, particularly its entanglement with infinity, 
raised doubts about the meaning and existence of algebraic numbers in the 
minds of Kronecker and others. If there is no simple and uniform formula 
for solving polynomial equations, can there be a simple and uniform way of 
dealing with algebraic numbers? 

Yes! Kronecker realized that the way to handle any algebraic number is by 
the solution field construction in Section 4.4. Namely, instead of constructing 
the solution @ of an irreducible polynomial equation p(x) = 0, construct the 
solution field Q(a) as the field of congruence classes of Q[x] mod p(x). In 
this field, w is represented by the congruence class [x], so all we want to 
know about a, algebraically, can be found by finite calculations with rational 
polynomials. 
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This idea illustrates an approach that is common in modern mathematics: 
The behavior of an individual object (such as w) may sometimes be best seen 
by embedding it in a larger object (such as Q[x]/(p(x))) which has “more 
structure” (in this case, the structure of a field). 

Before Kronecker and Dedekind, and before there were doubts about the 
nature of R, Cauchy (1847) used a solution field to give meaning to the (then 
still dubious) concept of »/—1. He embedded it as [x] in the field of real 
polynomials in x modulo x* + 1. This field is isomorphic to the field C of 
complex numbers. Thus, if one is willing to use R, one should also be willing 
to use C. 


4.8.3 Algebraic Integers 


The philosophy of “understanding through embedding in a larger structure” is 
implemented all the more when it comes to understanding algebraic integers. 
An algebraic integer is, first of all, an algebraic number, and hence it comes to 
us embedded in a field of algebraic numbers. But which members of the field 
should be regarded as integers? 

In the beginning, Euler (1770) made the lucky guess that numbers of the 
form a + b./—2, where a,b € Z, should be regarded as “integers.” In the 
1840s, Kummer also guessed correctly that the numbers in Z[¢,], where 


20 _, 20 
Cn = COS — +1 Sin —, 
n n 


should be considered “integers” (called cyclotomic integers, because they 
divide the unit circle into n equal parts), and indeed they are the integers of 
the field Q(gn). 

However, it would be a mistake to suppose that the numbers of the form 
a+ b/—3, where a,b € Z, are all the integers of the field Q(/=3). In 


fact, as Y=3 should also be considered an integer of this field. We have 
already seen merit in this viewpoint in the exercises to Section 2.9, where it 
was shown that unique prime factorization fails in VAR, —3], but is regained in 


Z, [=]. But why should aw be considered an integer? The number 


oa is the same as ¢3, so this brings us back to the question of why fy 
should be considered an integer. 

The answer has to do with the equation satisfied by ¢,, namely x” = 1. 
This is a monic polynomial equation with ordinary integer coefficients, and 
Eisenstein (1850) proved that the sum, difference, and product of numbers 
satisfying such equations is another number of the same kind. We proved this 
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theorem in Section 4.6. So the numbers satisfying monic polynomials form a 
ring, which (from our viewpoint today) is a good sign. Somehow realizing that 
such numbers are exactly what we want integers to be, (Dedekind, 1871, $160) 
defined algebraic integers to be the numbers satisfying monic polynomials 
in Z[x]. 

Later we will see that monic polynomials are the “right” concept to define 
“integers” in more general settings. 
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Preview 


As we know from Z and Q, divisibility in rings is more interesting than in 
fields, because one nonzero element of a ring usually does not divide another. 
This exposes the concepts of primes and unique prime factorization. To study 
primes and factorization in Z, we introduced the “microscope” of algebraic 
integer rings Zz, which can see inside ordinary integers by factoring them into 
algebraic integers. 

But unique prime factorization does not always hold in Zz, and this reveals 
what makes rings more complicated than fields: the presence of substructures 
called ideals. Ideals are so called because they realize (among other things) 
objects once called “ideal numbers.” These “numbers” were conjectured to 
exist by Kummer, in the hope of restoring unique prime factorization in the 
rings Zz. In effect, Kummer was looking for a “stronger microscope” than 
Ze — one that can see inside the algebraic integers themselves. 

Dedekind brought Kummer’s “ideal numbers” into reality as ideals. He 
then showed, with appropriate concepts of prime ideal and factorization, that 
unique prime ideal factorization holds in the algebraic integer rings Zr. 

In this chapter we introduce some ideals and the basic concepts around 
them: congruence modulo an ideal, homomorphisms, and the quotient R// 
of a ring R by an ideal. We also begin searching for a description of the rings 
that admit unique prime ideal factorization, by investigating some ideals in the 
rings Ze. 

This leads to the concept of Noetherian rings, which is the first step towards 
describing the right kind of ring for unique prime ideal factorization. 


104 
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Figure 5.1 Multiples of gcd (2, 1+ /—S). 


5.1 “Ideal Numbers’’ 


We saw in Section 3.2 that prime factorization seems to fail in Z[/ —5j], since 
6 has the two factorizations 


6=2-3=(1+ V—5)(1— V—5) 


into numbers that are “prime” in the sense that they are not products of numbers 
with smaller norm in Z| —5]. In such a situation, Kummer believed that the 
numbers 2,3, 1 + ./—5,1 — ./—5 could be further split into “ideal factors” so 
that the “ideal factorizations” of 2-3 and (1 + /—5)(1 — /—5) are identical. 

Pursuing this train of thought further, the ideal factors ought to include the 
“greatest common divisor” of 2 and 1 + ./—5. And what might the gcd be? 
Our study of gcd in Z gives a clue. There we found (Section 1.2) that 


gcd(a,b) =ma-+nb, forsome m,n € Z. 


This means gcd(a,b) is the least positive integer of the form ma + nb, since 
gcd(a, b) divides all numbers of the form ma + nb. It follows that the numbers 
ma + nb are all the integer multiples of gcd(a, b). 

We cannot find an actual member of Z[./—5] to serve as ged (2, 1+ ./—5), 
but we can find a subset of Z[/—5]| to serve as the “multiples of gcd (2, 1+ 
J/-5) .’ namely 


r= (2u+(1+ V-5)v: nv eZ v5]. 


This set, which in fact is {2m + (1 + J/-5)n: m,n € Z}, is shown amongst 
all the members of Z[/—5] as the set of black dots in Figure 5.1. 

Figure 5.1 confirms that J is not the set of multiples of any member 
of Z| /—5], because the multiples of any a € Z[ —5]| form a lattice of 
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Figure 5.2 Multiples of 2. 


Oo 08 000 0 0 @ 00 0 0 


oO 00 @ 0 0 00 0 @ 0 0 


Figure 5.3 Multiples of 1 + /—5S. 


rectangles of the same shape as Z| /—5]; namely, magnified by |q| and rotated 
by the angle of wa. The examples in Figures 5.2 and 5.3, whose black dots 
are the multiples of 2 and 1 + s/—5, respectively, show such magnified and 
rotated copies of Z| /—5]. The black set in Figure 5.1, however, clearly does 
not consist of rectangles. 

Thus, the “ideal number” gcd (2, 1+ /-5) can be realized by its “set of 
multiples” J, which is a subset of Z[/—5], but not as a member of Z[/—5]. 
This is not such a wild idea. We have already seen, in Section 1.7, that the 
natural way to view members of the finite field F, is as subsets of Z; namely, 
as congruence classes mod p. There are no members of Z that behave like the 
members of F,. It was Dedekind’s idea to realize “ideal numbers” as certain 
sets, which he called ideals. 
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Figure 5.4 Multiples of gcd (3, 1+ /—5). 


Exercises 


We stumbled on the “ideal number” / in the course of trying to explain the 
two “prime” factorizations of 6 in Z|/—5], 2-3 and (1 + /-5) (1 _ /-5). 
Namely, J represents the “greatest common divisor” of 2 and 1 + ./—5. This 
prompts us to investigate the “greatest common divisor” J of 3 and 1+ /—5, 
and to compare it with /. 


1. By writing uw =a+b/—5 and v =c+d-+/—5S, where a,b,c,d € Z, 
check that 


[2u + (1+ J-3)v: wv € Z| v5] 
= {2m + (1+ V=3)n: mn eZ}. 
2. Similarly, check that 
[3u+ (1+ V—3)v: wv 2 v—5]| 
= {3m + (1+ V—3)n: mn eZ}, 


so that {3m + (1 + /—5)n: m,n € Z} is also an “ideal number” J of 
VARY, —5] (which we might call “the multiples of gcd (3, 1+ /-—5)”). 
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3. Compare the picture of J in Figure 5.4 with that of the ideal J in Figure 
5.1. Explain why J is also not the set of multiples of any member of 
2[V—5]. 

4. The set J can be considered to contain a “half” of the points in Z| / —5]. 
Explain why J can be considered to contain a “third” of these points. 


5.2 Ideals 


Having found that “ideal numbers” should be regarded as subsets of a ring, the 
question is: what kind of subset? 


Definition. An ideal / in a ring R is a subset of R with the properties 
abeIlsaat+bel, aelandre R= arel. 


In other words, J is a subset of R closed under addition, and under 
multiplication by elements of R. It follows that J is also closed under additive 
inverse, because —1 € R and hence a(—1) = —a € J for any a e€ TI. This 
implies in turn that J is closed under differences. The set we used in the 
previous section to represent the “ideal number” gcd (2,1 + /-5) is easily 
seen to be an ideal by this definition. We can also see that the “ideal numbers” 
of Z correspond to the ordinary integers. 


Ideals in Z. Each ideal in Z has the form {na: n € Z}, for some a € Z. 


Proof. If I # {0} let a be the least positive member of J. Then J includes all 
multiples an for n € Z, by closure under sums and additive inverses. If J also 
includes some b # an for any n € Z, and if am is the nearest member of J less 
than b, then J also includes b — am. 

But 0 < b—am < a by the division property of Z (Section 1.1), so we have 
a contradiction. Oo 


An ideal of the form {ar : r € R} for some a € R is called the principal 
ideal generated by a, and is often written (a). Domains such as Z, whose 
ideals are all principal, are called principal ideal domains, or PIDs. We saw 
in the previous section that Z| /—5] is not a PID, and that one nonprincipal 
ideal in Z[/—5] is {2m + (1 + /-5) n: m,n € Z}. 

The latter ideal has two generators, 2 and 1 + ./—5, and we will see later 
that each ideal in Z| /—5] has a finite number of generators (one or two). Rings 
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in which each ideal is finitely generated are in a sense the next best thing to 
PIDs, and we study them further in Section 5.4. 

Coming back to ideals in general, for any ideal J in a ring R we have a 
concept of congruence modulo J. This leads to congruence classes, which 
themselves form a ring when added and multiplied in the obvious way. The 
ring of congruence classes, like other rings or fields of congruence classes we 
have seen, is a quotient of the ring R. 


Definition and notation. Elements a and b of a ring R are called congruent 
modulo an ideal / if a — b € I. We write a = b (mod /). 


The ring of congruence classes. Congruence modulo I is an equivalence 
relation, whose congruence classes [a] for the a € R form a ring under 
addition and multiplication defined by 


[a] +[b]=[a+b], [a][b] = [abd]. 


Proof. That congruence modulo J is an equivalence relation follows easily 
from the closure properties of J: 


e Each a =a (mod J) becausea —-a =Oe I. 

e Ifa =b (mod J), thena — b € I, hence also b — a € I,sob =a (mod J). 

¢ Ifa = b (mod J) and b = c (mod J), then a — b,b — c € I. Hence, their 
sum a — c € I, which means a = c (mod /). 


Consequently, R is partitioned into equivalence classes [a], and we have to 
check that the operations + and - are well defined on these equivalence classes. 
The proof is the same as that for congruence classes in Z (Section 1.7), which 
used only the ring properties of sum and product. 

Finally, the + and - operations on congruence classes inherit the ring 
properties from R, exactly as they did for Z. oO 


We call the ring of congruence classes modulo / the quotient of R by /, 
and denote it by R/J. The quotient ring is just part of a circle of ideas we 
investigate further in the next section. 


5.2.1 PIDs and Unique Prime Factorization 


Before we go more deeply into the nature of ideals — particularly nonprincipal 
ideals — we point out that PIDs always have unique prime factorization, so they 
have no need for “ideal numbers.” 
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Ina PID R we call an element p € R a prime if all divisors of p are of the 
form u or up, where u is a divisor of | (a “unit’”). Divisors of an element r not 
of the form u or ur will be called proper divisors of r. The proofs of existence 
and uniqueness of prime factorization are like those for Z, except that we have 
no Euclidean algorithm and no measure of the “size” of members of R. 


Unique prime factorization in a PID. In a PID, each element has a prime 
factorization, which is unique up the order of the primes and unit factors. 


Proof. (Existence). Given a PID R, take any element 7; € R. If 7; is a prime, 
we are done. If not, 7; has a proper divisor r2, and we repeat the argument 
with r2. Continuing in this way, we obtain a sequence of elements of R, 


Tl, 12, 13,  -+es 


each of which is a proper divisor of the one before. This gives an increasing 
sequence of principal ideals (an ascending chain, as we will call it later), 


(r1) C (r2) C (73) C+, 


whose union is also an ideal. Then, since R is a PID, there is an sj € R such 
that 


(71) U (v2) U (73) U- ++ = (8)). 


It follows that 5; € (r,) for some n, so r, is a divisor of s;. But no proper 
divisors of s; are in (s;), so in fact (rp) = (s,) and hence the sequence of 
proper divisors r1,72,73,... ends with r,. Thus, 7, = s; has no proper divisors 
and hence is prime. 

Having found one prime divisor s1 of r}, we similarly find a prime divisor sz 
of r;/s,, and so on. This process terminates for a similar reason — considering 
the ideals (71) C (11/51) C (11/5152) C +++ — SO we get a prime factorization 
of any element of R. 

(Uniqueness). As usual, it suffices to prove the prime divisor property: if a 
prime p divides a product ab, then p divides a or p divides b. So we suppose 
that p divides ab, but p does not divide a. As always, the latter assumption 
means that units are the only common divisors of a and p. 

Now the set {ma + np: m,n € R} is an ideal, hence of the form (c) since 
R is a PID. But then c divides both a and p, so c is a unit and (c) = R. In 
particular, 


l1=ma+np_ forsomem,n e€ R. 
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We are now back on familiar ground: multiply both sides of this equation by b, 
b=mab-+npb, 


observe that p divides both terms on the right, so p divides b, as required. O 


Exercises 


1. Generalize the proof that Z is a PID to show that any Euclidean domain 
(Section 1.6) is a PID. 

2. Adapt the proof above to show that any PID R satisfies the ascending 
chain condition: if J; C bb C 1, C --- are ideals of a PID R, then the 
sequence J), 9, J;, ... is eventually constant. 

3. Give examples of ideals J in the rings Z and Q[x] where R/J is a field, 
and others where R/T is not a field. 


5.3 Quotients and Homomorphisms 


The relation of congruence modulo an ideal is a characteristic of rings that 
they do not share with fields, since the only ideals of a field F are (0) and F 
itself. We therefore introduce some language and terminology for discussing 
the notions around ideals and congruence in rings. 


Definitions and notation. The ring of congruence classes modulo J in R is 
called the quotient of R by / and is denoted by R/J. The map 


w:R—> R/I definedby w(a) = [a], 


where [a] is the congruence class! of a mod J, is called the canonical 
homomorphism of R onto R/T. 

A ring homomorphism in general is a map of rings f: R > R’ such 
that f(a+b) = f(a)+ f(b) and f(a-b)= f(a): f(b). Its kernel is 
{a € R: f(a) = 0}. 


The next theorem shows that any ring homomorphism is essentially the 
canonical homomorphism for an ideal 7, with kernel 7. Readers who know 
group homomorphisms and their kernels will notice the analogy between rings 


! Tt would be more precise to write this congruence class as [a];, to show its dependence on /, 
but it is usually obvious which ideal defines the congruence. 
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and groups in the theorem below: just as the kernel of a group homomorphism 
is anormal subgroup, the kernel of a ring homomorphism is an ideal. 


Homomorphisms and ideals. [f f: R — R’ is a ring homomorphism onto 
R’, then I = {a € R: f(a) = 0} isan ideal and the elements of R’ correspond 
to the congruence classes of R mod I. 


Proof. By definition of /, 

laJ=[b] @a-belsefa—b)=0 
<> f(a) — f(b) =0 (because f is ahomomorphism) 
= f@ = fd). 


Thus the congruence classes mod J correspond to elements of R’. 
The equations f(a+ b) = f(a) + f(b) and f(ab) = f(a)f(d) are 
equivalent to 


[a+b]=[a]+[b] and [ab] = [a]- [b], 


which we proved to hold in the previous section. Oo 


Our results about congruence in Z and other rings can be, and often are, 
written in the language of homomorphisms, ideals, and quotients. 

The congruence classes [a] of integers mod n are the congruence classes 
modulo the principal ideal {an: n € Z}, which we denote by aZ. The ring of 
these classes is therefore the quotient ring Z/aZ. In particular, the finite field 
F, equals Z/pZ. (These are the notations we foreshadowed in Section 3.6.) 

We could also write the quotient Z/aZ as Z/(a), since aZ is the principal 
ideal generated by a, which we write as (a). An instance where the principal 
ideal notation is usual is the ring of congruence classes of polynomials in Q[x] 
modulo an irreducible polynomial p(x), which we introduced in Section 4.4. 
It is usual to denote this ring by Q[x]/(p(x)). 

We will see in the next section that when the quotient of a ring by an ideal 
is a field, as is the case for Z/(p) and Q[x]/(p(x)), then the ideal is maximal. 
That is, there is no larger ideal except the whole ring. To see why the ideal pZ 
is maximal in Z, consider an ideal J containing pZ and some integer g not a 
multiple of p. Then there are integers m and n such that 


1 = ged(p,q) = mp + nq, 


and mp + nq € I because J is closed under sums and differences. But then / 
contains all integer multiples of 1, so J = Z. 
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Exercises 


The congruence classes of VAN, —5| modulo the ideals J and J of the previous 
section provide interesting examples of quotient rings and maximality. 


1. Explain why there are two congruence classes of Z[/—5] modulo /; 
namely, the classes [0] and [1]. 

2. Deduce that Z[/—5] /I is aring with two elements. 

. Explain why this ring is the field Fo. 

4. Explain why there are three congruence classes of Z| /—5] modulo J; 
namely, the classes [0], [1], and [—1]. 

5. Explain why the ring Z[/—5] /J is the field F3 with three elements. 


Ww 


In the next section we will see that this means J and J are maximal ideals. 
However, in the case of J and J, we can also prove maximality directly. 


6. Show that any ideal of R = Z| /—5] containing J and some x ¢ R is the 
whole of R. 

7. Similarly, show that any ideal of R containing J and some x ¢ R is the 
whole of R. 


5.4 Noetherian Rings 


In a general ring the argument that maximality of an ideal is equivalent to the 
quotient being a field goes as follows. 


Characterization of a maximal ideal. An ideal I in a ring R is maximal if 
and only if R/T is a field. 


Proof. By the following sequence of equivalences: 
I is maximal <> R = (ideal containing J and any g # /) 
<> 1 € (ideal containing J and any g # /) 
<> for any gq ¢ / there isr € R withrg = 1 (mod /) 
<> for any [g] # [0] there isr € R with [r][g] = [1] 
< any [gq] # [0] has an inverse, mod J 
<> R/T isa field. 


because we have already proved in Section 5.2 that R/J is a ring. Oo 
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This theorem enables us to see that the nonprincipal ideal in Z| / —5], 
l= [2m + (1 + V/—5)n: mn e€ z[v-3]}. 


which we called the “multiples of gcd (2,1 + J/-5)” in Section 5.1, is a 
maximal ideal of Z|/—5]. This is because it is clear from Figure 5.1 that there 
are just two congruence classes mod J in Z|/—5]: the class [0], which is 7 
itself, and the class [1], which is the complement of /. Therefore, the quotient 
Z| /—5] /T is a ring with two elements, which is necessarily the field F2, since 
the only nonzero element, [1], is self-inverse. 

Every ring in which 0 # 1, called a nonzero ring, has a maximal ideal. 
The proof, due to Krull (1929), is rather subtle. It is easier to exhibit a ring 
containing a collection of ideals with no maximal member. For example, in the 
ring of all algebraic integers one has the increasing sequence of ideals 


1/8 


{multiples of 2'/*} c {multiples of 2'/*} Cc {multiples of 2!/8} c .-- 


which obviously has no maximal member. Thus the ring of all algebraic 
integers has infinite ascending chains of ideals — for much the same reason 
that it does not have “primes,” already noted in Section 4.6. 


Definition. A ring R is Noetherian if every nonempty set C of ideals of R has 
a maximal member; that is, one not properly contained in any other member. 


This concept is named after the great algebraist Emmy Noether. She 
introduced an equivalent defining property (the ascending chain condition 
below) in Noether (1921). The Noetherian property is vital for the rings 
we study in our pursuit of unique prime ideal factorization. Its importance 
is clearer when one sees some equivalent statements of it, which relate 
containment to finite generation. To distinguish clearly between containment 
relations, we write A C B or B 3 A for “A is contained in but not equal to B,” 
and A C B or B 2 A for “A is contained in or equal to B.” 


Equivalents of the Noetherian property. A ring R is Noetherian if it has any 
of the following three properties: 


I. Any nonempty set of ideals of R has a maximal member. 

2. Any nondecreasing sequence I, C In C 1g C - -- of ideals of R eventually 
becomes constant. (Ascending chain condition, ACC.) 

3. Any ideal I of R is finitely generated; that is, there are e1,...,@n € I such 
that I = {ryey +--+ +rnen: 11,---,%n € R}. 


Proof. We prove the equivalence of 1, 2, and 3 by showing 1 > 35> 2=> 1. 
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Figure 5.5 Emmy Noether (1882-1935) (public domain). 


1 = 3. Let J be any ideal of R and consider the collection of finitely 
generated ideals C J. By 1, this collection has a maximal member, M. Since 
M Cl, there is aq € I not in M. But then g and the generators of M define 
a finitely generated ideal J C I with J > M. Hence, in fact J = J, so I is 
finitely generated. 

3 => 2.Let ll) C Ib C15 C --- be a nondecreasing sequence of ideals. The 
union J of this sequence is also an ideal, hence it has a finite set of generators 
€},...,€n by 3. Each e; € some J;, hence all of e1,...,é, € some J;. But then 
the sequence J; C In C Iz C --- is constant from J; onwards. 

2 = 1. Suppose that C is a set of ideals of R with no maximal member. In 
other words, for any J € C there is an I’ € C with J Cc I’. 

Choose any 1; € C. Then J; is not maximal, so we can choose Ip € C 
with J; C Jy. Likewise, Jy is not maximal, so we can choose [3 € C with 
Iz C 43. Continuing in this way, we obtain an infinite ascending chain of 
ideals, contrary to ACC. Therefore, every nonempty set of ideals in R has a 
maximal member. Oo 


It is common in algebra books to prove the existence of maximal elements 
with the help of a principle of set theory known as Zorn’s lemma, which is 
equivalent to the axiom of choice. In the last step of the proof above we also 
used choice, because we were not able to define the sequence J), I, 13, and so 
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on. But this choice principle — involving a sequence of choices, each depending 
on the one before — is simpler and arguably more plausible, called dependent 
choice. The dependent choice principle is known to be weaker than the full 
axiom of choice, or Zorn’s lemma, but like them, it is not provable from the 
other axioms of set theory. 


Exercises 


Zorn’s lemma is about a collection C of sets, the containment relation C, and 
subcollections B that are linearly ordered by containment; that is, if B;, B; are 
in B, then either B; C B; or B; C B;. The lemma states that C includes a 
maximal element if each linearly ordered subcollection B has an upper bound 
B; that is, B D each B; in B. 

One of the simplest cases where we really need Zorn’s lemma is the proof of 
Krull’s theorem that any ring containing nontrivial ideals contains a maximal 
ideal. Suppose R is a ring containing ideals 7 other than (0) and R itself. Let 
T be the collection of all such ideals in R. To apply Zorn’s lemma, we have to 
show that any linearly ordered subset of Z is contained in a member of TZ. 


1. If B is a linearly ordered subset of Z, prove that the union B of all 
members of B is an ideal. Where did you use the linear ordering? 


Obviously B # (0), but why is B # R? Here is a hint. 


2. Explain why no J ¢€ Z includes the element 1, and deduce that B € 7. 
3. Conclude, by Zorn’s lemma, that R contains a maximal ideal. 


5.5 Noether and the Ascending Chain Condition 


The definition of the Noetherian property in terms of maximality is not given 
in the founding paper on the subject, Noether (1921). In fact, Noether proves 
the equivalence of the other two statements of the condition, thereby avoiding 
obvious use of a maximal principle such as Zorn’s lemma. Her starting point 
is the assumption that ideals are finitely generated, a condition inspired by 
the famous finite basis theorem of Hilbert (1890). In her first theorem (the 
“theorem of the finite chain’) she proves that the finite generation of ideals 
implies the ascending chain condition as in the step 3 = 2 in the proof above. 

Then she notes without proof that finite generation of ideals follows from 
the ascending chain condition, so the latter can also be taken as a finiteness 
condition “in basis-free form.” The proof is short and almost obvious, but it is 
worth stating explicitly, because it too involves dependent choice. 
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ACC implies ideals are finitely generated. [f R is a ring satisfying ACC, 
then any ideal I of R is finitely generated. 


Proof. Suppose I is an ideal of R that is not finitely generated. Choose an 
element e; € J. Then the ideal /; generated by e; is not all of 7, so we can 
choose an e € J not in /;. Then the ideal J, generated by e1, e is also not all 
of I, so we can choose an e3 € J that is not in Jo. 

Continuing in this way, we obtain a strictly increasing sequence of ideals 
I, Cl, C8 C.---, contrary to the ascending chain condition. 

Hence, in fact J is generated by some finite set of elements (a “basis”). O 


5.5.1 Hilbert’s Basis Theorem 


The Hilbert basis theorem will not be used later in this book; nevertheless, 
it is worth giving a proof, because of the influence the theorem had on 
Emmy Noether and the formulation of ACC in particular. Hilbert’s proof, in 
Hilbert (1890), implicitly used ACC. It also used dependent choice, in order 
to obtain an infinite sequence that was not obviously definable. This provoked 
the criticism that Hilbert was doing “not mathematics, but theology.” Modern 
versions of the theorem mention the Noetherian assumption explicitly in both 
the statement and the proof, and they tend to downplay Hilbert’s original goal — 
which was to show the existence of a finite “basis” (meaning generating set) 
for any ideal in certain polynomial rings. 


Hilbert’s basis theorem. /f R is a Noetherian ring, then the ring R[x] is also 
Noetherian. 


Proof. Suppose, for the sake of contradiction, that an ideal J C R[x] is not 
finitely generated. Thus, for any polynomials fo, ..., f, € J, the ideal 


(fo, ---> fn) ={rofo +-::+ msn: To, ---.tn € RY 


is not the whole of J, and hence there is a polynomial f;,+; in the set difference 


I- (fo, 2b . fn): 


Then, using dependent choice, we can choose a sequence of polynomials 
to. fi, fo... € I so that 


fo <€J_ is of minimal degree, 
fi € 1— (fo) is of minimal degree, 


fo © 1 — (fo. fi) is of minimal degree, 
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Since we choose minimal degree at each step, 


deg( fo) < deg( fi) < deg(fo) <--- . 


Now let a, be the coefficient of the highest power of x in f,, and consider 
the ideals in R: 


(ao) E (ao,41) © (ao, 41,42) G +++. 


Since R is Noetherian, this ascending chain of ideals becomes constant, say at 
(ao, aes ,an). Now aneé (do, aunts ,anN—1), so 


an =roado +-:-+rn—1an_-1 forsomero,...,ry—1 € R. 
This leads us to consider the polynomial 


g= rox el fn)—deafo) Ff, ag ry_1xo°8(fn)—des(fn—-1) fn—1, 


whose highest-power term 


(roag + ++» + ry—1ay—1)x% = ayx™ 


is the highest-power term in fy. By its definition, g € (fo,..., fw—1). And 


also fy € (fo,---,fn—1), by definition of fy, so fy — g ¢ (fo,..-, fn—1).- 
But 


in -—g €I1-(fo,---,fn—-1) has degree less than N = deg( fy), 


which contradicts the minimality of fy. Oo 


Hilbert was actually interested in finding a finite basis for polynomial ideals 
in many variables, but this follows from the above formulation of the theorem. 
If R is Noetherian, then so is R’ = R[x,], and hence so is R” = R'[x2] = 
R[x1,x2], and so on. 


Exercises 


Any ideal K of Z| / —5| is finitely generated, so Z| / —5j] is Noetherian. In fact 
K is generated by (at most two) elements nearest to 0 in different directions. 
The proof assumes only that K is closed under sums and differences. 


1. Let u be a nonzero member of K of least absolute value. Show that all 
elements of K on the line through 0 and u are integer multiples mu of u. 

2. Next, let v be the least nonzero member of K not on the line through 0 and 
u. Show that the elements mu + nv divide the plane into parallelograms 
with adjacent sides of length |u| and |v]. 
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3. Thus any w € K not of the form mu + nv lies in one of the 
parallelograms, either in the interior of the parallelogram or in the interior 
of an edge. Show that this leads to an element x € K such that |x| < |u| or 
|x| < |v|, contrary to the definition of u and v. 


Hilbert’s basis theorem shows one way — passing from R to R[x] —in which 
the Noetherian property passes from one ring to another. Another way is by 
forming a quotient. Namely, if J is an ideal of a Noetherian ring R, then R/J 
is also Noetherian. To prove this, use homomorphisms as in Section 5.3. 


4. Given a Noetherian ring R, ahomomorphism f: R — R’ onto ring R’, 
and an ideal J’ of R’, show that f~!(J’) is an ideal of R. 

5. Show that a finite basis of the ideal f~!(1’) gives a finite basis of J’. 

6. Conclude that R/T is Noetherian for any ideal J of a Noetherian ring R. 


This gives another, purely algebraic, proof that Z| / —5| is Noetherian. 
7. Express Z|. /—5 in the form Z[x]/T. 


5.6 Countable Sets 


The step 2 => | in proving the equivalents of the Noetherian condition is more 
interesting in the case of a ring R with countably many ideals, where the axiom 
of choice is not needed. The rings Zz have this property,” so it is worth saying 
more about countability. 

A set S is called countable if S = {xo,.x1,x2,x3, ...}. In other words, the 
members of S can be “listed” as x9, x1,%2,x3..., $0 each member of S is x, for 
some natural number n. Examples of countable sets (with suitable listings) are 


N = {0,1,2,...} 

Z= {0,1, — 1,2, —2,,3, —3,...} 
N x N = {(0,0), (0, 1), (1, 0), (0, 2), 1, 1), (2,0), (0,3), C1, 2), (2, 1), (3,0), .. .}- 
In the latter case the ordered pairs (m,n) are listed in groups: first those with 
m + n = 0, then those with m + n = 1, and so on. And within each group, 
pairs are listed in lexicographic order, or “dictionary order.” This ordering 
device can also be used to list pairs, triples, and so on, enabling us to show 


countability of sets such as Q, Z[ V2], and indeed any algebraic number field 
or ring. 


2 As was remarked by (Noether, 1926, footnote 26), expanding on a very brief remark she made 
about the axiom of choice in Noether (1921). 


https://doi.org/10.1017/97810090041 38.007 Published online by Cambridge University Press 


120 5 Ideals 


It follows that an algebraic number ring Zz has countably many ideals, 
since (as we will eventually see) these ideals are finitely generated and hence 
may be listed by listing the finite generating sets, using lexicographic order. 
Thus it suffices to prove 2 = 1 in the case of a set R and a countable collection 
C of its subsets. The proof is simple and purely about set containment. 


Existence of a maximal element. Suppose C is a countable collection of 
subsets I of a set R such that any nondecreasing sequence 


NhOChCKkE::: 
eventually becomes constant. Then C has a maximal element. 


Proof. Suppose on the contrary that C is a countable collection of subsets of R, 
satisfying the ascending chain condition, but with no maximal element. Let J; 
be any member of C. Then /; is not maximal, so there is a first member J} in 
the listing of C such that J; C J}. Similarly, /, is not maximal, so there is a 
first member J; in the listing of C such that J, C J. 

Continuing in this way gives an infinite chain J; C I, C I, C ---, contrary 
to the ascending chain condition. Hence, C has a maximal element. Oo 


The crucial difference between this proof and the proof for an arbitrary 
collection C is the ability to find a first member of C satisfying a certain 
condition, and hence define the infinite ascending chain J} C I) CI, C ---. 
In the general case the axiom of choice allows us to assume an ordering of C 
for which there is a “first” element satisfying any condition. Such an ordering 
is called a well ordering, and its existence is equivalent to the axiom of choice 
(and to Zorn’s lemma). It is fair to say that the axiom of choice is a kind of 
magic that allows us to imitate the proof for countable sets. 


Exercises 


The listing of ordered pairs (m,n) can be given by an explicit quadratic 
function known as the Cantor pairing function, which corresponds to the 
visually natural way of ordering the pairs shown in Figure 5.6. 


1. Taking the diagonals to be ordered from left to right, and the pairs on 
each diagonal from top to bottom, explain why (m,n) is the mth element 
on the (m + n)th diagonal. 

2. Deduce that the position of (m,n) in the ordering of pairs is 


1 
mt OF1$24+--4 (mtn) am + BT Tat ) 


when (0,0) is taken to be at position 0. 
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(0,0) (1,0) (2,0) (3,0) 
Figure 5.6 Ordering the pairs of natural numbers. 


This proves that the function P(m,n) = m+ imtnyintn tl) 


of ordered pairs one-to-one onto N. 


maps the set N x N 


3. Use P to construct a one-to-one function from N x N x NtoN. 


5.7 Discussion 


Since we have singled out the axiom of choice and countability in the last 
two sections, we should not fail to mention what makes them pertinent: the 
existence of uncountable sets, proved first by Cantor (1874) and with greater 
impact by Cantor (1891). In particular, Cantor showed the uncountability of R 
by the following argument. 

Suppose we have a countable set of real numbers x1,x2,x3,... . We are 
going to show these are not all the real numbers by defining a real number x 
different from each of x1,x2,x3,.... This is easy to do if we suppose we have 
the decimal expansion for each x;, say 


x; =O11111... 
xy = 3.14159... 
x3 = 1.41421... 
x4 = 0.00000... 
x5 = 1.21221... 
x= 0.21112... 
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We can assume each expansion is infinite by adding 0000... at the end of 
any finite expansion. Then we make x different from x, in the nth decimal 
place. This is certainly a decimal expansion for x different from the given 
expansions x1,X2,x3,.... And we can make sure the expansion represents a 
different number x by ensuring that it does not end in either 0000... or 9999... 
because any other expansion belongs to only one number. For example, we 
could use the rule: 


if the nth digit of x, is 1, let 2 be the nth digit of x; 
if the nth digit of x, is not 1, let 1 be the nth digit of x. 


This is how we obtained x in the example above. The digits of x1,x2,x3,... 
shown in bold are the digits where the rule makes x different from x1, x2,x3,... 
in turn. Because these digits lie along the diagonal, the method is called the 
diagonal argument. 

This argument shows that no countable set includes all real numbers, and 
hence R is uncountable. 


5.7.1 The Axiom of Choice 


The axiom of choice asserts the possibility of “choosing” members of sets 
where there is no apparent rule for doing so. The precise statement is the 
following. 


Axiom of choice (AC). For any set X of nonempty sets Y, there is a function 
f on X with f(Y) € Y for each Y € X. (The function f is called a choice 
function.) 


To many people, AC seems obvious, so they willingly accept all of its 
consequences. Indeed, when the sets Y are all subsets of a countable set Z, 
the existence of a choice function is provable. We suppose 21, 22,73,... iS a 
list of the members of Z, and let f(Y) be the first member of Y on this list. 

But in the presence of uncountable sets, it is easy to give examples where 
no one can define a choice function. For example, let 


X = {all nonempty sets Y C R}. 


There is no known rule for choosing a real number from each nonempty set of 
reals, and in fact the standard axioms of set theory cannot prove the existence 
of a choice function in this case. That is why AC is an axiom! It is an axiom 
about uncountable sets. 

AC is a particularly remarkable axiom, not only because of its independence 
from the other axioms of set theory — proved by a combination of results due 
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to Godel (1938) and Cohen (1963) — but also because it has many natural- 
sounding equivalents. Thus, we might say that AC is the “right axiom” to prove 
these equivalent theorems. The first of them was the well-ordering theorem, 
first stated by Cantor (1883) as a “fundamental law of thought.” The well- 
ordering theorem extends the well-ordering property of the natural numbers, 
mentioned in Section 1.1, by asserting that any uncountable set can be ordered 
in such a way that each of its nonempty subsets has a least member. 

Zermelo (1904) introduced AC in order to prove the well-ordering theorem, 
and it is easy to show that well-ordering implies AC. Thus the well-ordering 
theorem is an equivalent of AC. 

A second equivalent is Zorn’s lemma, described in the exercises to 
Section 5.4. Zorn’s lemma is actually due to Kuratowski (1922), but it was 
rediscovered by Zorn (1935), and it became the “algebraist’s axiom of choice” 
when popularized by Bourbaki. Because Zorn’s lemma asserts existence of 
maximal elements, it gives almost instant proofs that maximal algebraic 
objects exist. Two examples where existence of the algebraic object is actually 
equivalent to AC are: 


Existence of a maximal ideal in any nonzero ring. This was proved by 
Krull (1929), assuming the well-ordering theorem. Hodges (1979) proved 
that existence of maximal ideals implies AC. 

Existence of a vector space basis. This was proved (essentially) by Hamel 
(1905), though he was interested in a special case we will discuss in the 
exercises to Section 6.1. Blass (1984) proved that the existence of vector 
space bases implies AC. 


However, there are cases where the weaker axiom of dependent choice 
suffices, as in equivalence proof for the three statements of the Noetherian 
property. Dependent choice is weaker than AC because it is known not to 
imply some consequences of AC. A famous question that distinguishes the 
two forms of choice concerns nonmeasurable sets: Vitali (1905) showed that 
AC implies there are nonmeasurable subsets of R, but Solovay (1970) showed 
it is consistent with dependent choice for all subsets of R to be Lebesgue 
measurable. 


5.7.2 Axioms of Infinity for Countable Algebra 


As we showed in the case of the Noetherian condition, in countable algebraic 
structures it is possible to prove the existence of maximal elements without 
AC. This raises the question: if we do not need AC to prove a certain theorem, 
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exactly what assumption about infinity do we need? In recent decades, 
reverse mathematics has found very precise answers to this question for 
many classical theorems of analysis, topology, and algebra. It has also found, 
remarkably, that the same few assumptions about infinity turn out to be the 
“right axioms” to prove most of the well-known theorems. 

In the case of algebra, reverse mathematics is concerned with countable 
structures, particularly countable rings, fields, and groups. The foundation of 
any study of countable structures is the theory of the natural numbers; that is, 
arithmetic. However, we now have to describe arithmetic very precisely, in a 
language that typically has the following ingredients: 


1. The constant 0 and the function symbol S (for successor), which give us 


names 0, SO, SSO, ... for the natural numbers. 
2. The symbols + and - for sum and product. 
3. Variables a,b,c, ...,x,y,Z,... for natural numbers and X,Y, Z,... for 


29 66 


sets of natural numbers. 
not 99 Ges 
> 


4. Logic symbols: A, V ,—, => for “and,” “or, implies,” the 
quantifiers V and J for “for all” and “there exists,” and the parentheses 
(and ). 

5. The symbols € for membership and = for equality. 


Given some rules for writing meaningful formulas, quite obvious but tedious 
to enumerate, we can then write down the Peano axioms for arithmetic: 


Vx(0 # Sx) (0 is not a successor) 
VxVy(x # y => Sx # Sy) (S is injective) 
VaxVy(x +0=xAx4+ Sy = S(x+ y)) (inductive definition of +) 
VxVy(x-O=O0Ax-Sy=x-y+x) (inductive definition of -) 


VX((0e XAVa(n e X Sn+1€X)) = n(n € X)) 
(induction axiom) 


The induction axiom axiom formalizes the “base step, induction step” idea of 
induction. It says that if X includes 0, and if X includes n + 1 when it includes 
n, then X includes all natural numbers. 

To these we need to add a set existence axiom, because as yet we have no 
axiom allowing us to prove that sets of natural numbers exist. A natural axiom 
to choose is one saying that a set exists if it has a definition in the language 
of arithmetic. For example, the following formula g(x) defines “x is an even 
number”: 


dy(x = SSO- y). 
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We might also want to talk about the even numbers in a given set Z. These are 
defined by the formula 


x €ZAAy(x = SSO-y). 
Motivated by these examples, we state the arithmetic comprehension axiom: 
AX(ne X } g(n)), 


where g() is any formula in the language of arithmetic with no quantified set 
variables and not including the set variable X. We exclude X from g to avoid 
circularity — defining X in terms of itself — and we do not allow the other set 
variables Z in g to be quantified, since that too is circular. (If X depends on a 
condition involving all sets, then X depends on itself.) 

In this axiom system it is possible to prove the countable versions of the 
maximal ideal and vector space basis theorems mentioned in the previous 
subsection: 


e Any nonzero countable ring has a maximal ideal. 
e Any countable vector space has a basis. 


Moreover, each of these theorems implies the arithmetic comprehension 
axiom, so the arithmetic comprehension axiom is the “right axiom” to prove 
them. This was proved by Friedman et al. (1983), and the proofs may be seen 
in Simpson (2009). 

The system of Peano axioms plus arithmetic comprehension is called 
ACAg. There are also two noteworthy weaker systems, RCAg and WKLo, 
which are more complicated to describe. In the weakest system, RCAo, only 
“computable” sets are assumed to exist. Rather surprisingly, RCAgo is strong 
enough to prove the fundamental theorem of algebra. If Kronecker had known 
this, perhaps he might have changed his mind about FTA! For an introduction 
to ACAo, WKLo, and RCAg, see Stillwell (2018). 
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Vector Spaces 


Preview 


The difference between fields and rings is reflected by a difference in the kinds 
of space for which they can serve as “coordinates.” A space with coordinates 
from a field F is called a vector space (“over” F’, or an F-vector space) and 
a space with coordinates from a ring R is called a module (an R-module). Of 
course there are similarities between vector spaces and modules, but vector 
spaces are easier to work with. So we discuss vector spaces in this chapter, and 
in Chapter 8 we see how far their properties carry over to modules. 

The key properties of a vector space are the existence of a basis and an 
associated dimension. These properties behave as expected with respect to 
subspaces: if U is a subspace of a vector space V, then U has a basis no larger 
than any basis of V. Dimension also behaves properly with respect to linear 
maps, the maps that preserve vector sums and scalar multiples, and which may 
be represented by matrices. 

When vector spaces are used in number theory, it is common to vary the 
field F of scalars, and to view one field E relative to, or “over” a subfield F. 
This leads to viewing E as a vector space over F’. A basis of this vector space 
is then called a “basis for E over F,” its dimension is the “dimension of E 
over F’.” and so on. Typically, E = Q(a) for some algebraic number a. Then 
to investigate some 6 € E, we may consider F = Q(£) and look at both E 
over F and F over Q, and the associated bases and dimensions. 

Since this approach to linear algebra is more general than the typical 
undergraduate course, we spend some time reviewing basic facts such as 
existence of bases in this chapter, and continue in the next with the deeper 
theory surrounding the determinant. 


126 


https://doi.org/10.1017/97810090041 38.008 Published online by Cambridge University Press 


6.1 Vector Space Basis and Dimension 127 


6.1 Vector Space Basis and Dimension 


I expect that most readers of this book have had a course in linear algebra, 
with linear equations, matrices, linear maps, determinants, and perhaps a 
concept of vector space. Very likely, the coordinates of all the spaces, matrix 
entries, and coefficients of equations were real numbers, because these are 
the numbers relevant to most users of linear algebra. However, solving linear 
equations requires only the operations of addition, subtraction, multiplication, 
and division, so linear algebra can take place over any field F. Most of what 
the reader already knows can be applied to any field. The basic definitions are: 


Definitions. A vector space over a field F, or an F-vector space, is a set V 
of objects called vectors which can be added and multiplied by members of 
F.. (Members of F are also called scalars in this context.) V includes a zero 
vector 0 and, for each vector v, an additive inverse — v. 

Vector addition has the properties of an Abelian group (written in additive 
notation, as in Section 3.7), namely 


utv=v-+4u, 
u+(v+w)=(u+v)+ wv, 
u+0=u, 
u+(—u)=0. 


Scalar multiplication has the properties, for any a,b € F, 


a(bu) = (ab)u, 
lu =u, 
a(u+v)=au+av, 
(a+ b)u =au-+ bu. 
Thus, although addition and multiplication now have a more general mean- 
ing, their properties are like those for ordinary addition and multiplication.! 


This makes it easy to compute with vectors and scalars, as long as one does 
not generally attempt to multiply or divide vectors! 


! Th fact, when the field R of real numbers is viewed as a vector space over itself, addition and 
scalar multiplication are just the ordinary addition and multiplication, because the real numbers 
are both vectors and scalars. 
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In abstract algebra, the option to choose the field F is important, because 
more than one field may be under consideration at the same time. For example, 
the field 


Q(v2) = {a+ bv2: abe Q} 


can be viewed as a vector space over itself, by viewing members of Q(V2) as 
both “vectors” and “scalars.” However, it is more enlightening to view Q(V2) 
as a vector space over the field Q, so the “scalars” are the rational numbers. 
From this viewpoint, it seems that Q(v2) is two-dimensional relative to Q, 
because eacha +b V/2 € Q(v2) has two rational “coordinates,” a and b. 

This concept of dimension is formalized via the following definitions, 
which extend those given in Section 4.5 in the special case where fields are 
viewed as vector spaces. 


Definitions. Members v,, ...,v, of a vector space V over a field F are said 
to be linearly independent if 


ajv; +---+ad,0;, =0, wherea,...,a, € F, 


only ifa; =---=a, = 0. 
Members v1, ...,v, are said to span V if, for each v € V, 


v=a,v,;+---+a,v0, forsomedy,...,d,) € F. 


A finite basis for V over F is a linearly independent set of vectors v1, ..., Uy 
in V that span V. In this case V is said to be finite-dimensional, and the 
dimension of V over F isn. 


According to these definitions, the vector space Q(v2) has basis with 
members | and /2, so it has dimension 2 over Q, as anticipated. (We pointed 
out in Section 4.5 that 1 and V2 are linearly independent over Q, because /2 
is irrational.) More generally, an algebraic number field Q(q) of degree n has 
dimension n as a vector space over Q because it has basis l,a,... a"! as 
we showed in Section 4.5. 

The existence of a basis for an arbitrary vector space over an arbitrary field 
is more problematic, because it is equivalent to the axiom of choice. A famous 
example is R as a vector space over Q: even here, existence of a basis depends 
on the axiom of choice. Fortunately, the bases we want to study in this book 
are finite. However, we still need to check that “dimension” is well defined, 
because it is conceivable that finite bases of different sizes exist. 
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Exercises 


The idea of viewing R as a vector space over Q goes back to Hamel (1905). He 
used it to create a discontinuous function f such that f(x+y) = f(x4)+ fQ) 
for any x, y € R, a property previously known to hold only for the continuous 
functions f(x) = kx. 

A Hamel basis for R over Q is obtained by a “maximal” argument like 
the one used in the exercises to Section 5.4 to prove the existence of maximal 
ideals. 


1. Suppose that B is a set of real numbers that are linearly independent over 
Q. That is, if x1,...,%, € Band q,...,gn € Q, then 


gxit-e-+ Gn =90 > QM =: =GQn = 0. 


Show that if B is not a basis, then there is an independent set B’ > B 
whose span includes some real x’ not in the span of B. 

2. Suppose BG is a collection of independent sets B; that is linearly ordered by 
C; that is, if B;, B; € B, then either B; C B; or B; C B;. Show that if 
X1,...,Xp, are members of sets in G, then in fact x1, ...,X, belong toa 
single B; € B. 

3. Deduce that the union B of all sets B; € BG is also an independent set, so B 
is an upper bound to B. 


Now according to Zorn’s lemma (see Section 5.4), the collection C of all 
independent sets of real numbers has a maximal element, C. 


4. Deduce from exercise | that C is a basis for R over Q. 
5. By the same argument, show that any field E has a basis over a subfield 
FCE. 


We can now use the Hamel basis B of R over Q, as Hamel did, to find a 
discontinuous function f such that f(x + y) = f(x) + f(y). 


6. Let xo be any particular member of B, and let f(x) be the coefficient of xo 
in the expression for x in its unique expression qix1 +---+4nXn =0 
as a linear combination of members of B with rational coefficients 
(taking f(x) = 0 if xo does not occur in this expression). Explain why 
flaty) = fe) + FO). 


7. Also explain why f is not continuous. 
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6.2 Finite-Dimensional Vector Spaces 


In Section 4.5 we already appealed to the concept of dimension, when claiming 
that a basis of n elements implies that any set of nm + 1 elements is linearly 
dependent. The context was a field EF’, viewed as a vector space over a field F. 
The proof for any finite-dimensional vector space V over F can be done via 
the following lemma, often called the “Steinitz exchange lemma” but known 
earlier to Grassmann (1862). 


Grassmann exchange lemma. /fn vectors span V, then non + | vectors are 


independent. 
Proof. Suppose (for the sake of contradiction) that uw, ...,u, span V and that 
V{,.--,Un,Un+1 € V are linearly independent. The plan is to replace some u; 


by v; while ensuring that v; and the remaining w; still span V. We can likewise 
replace another u; by v2, and so on. In the end, all the u; are replaced, and we 
have a spanning set v),...,U,. In particular, v,41 is a linear combination of 
V1,...-,Un, contrary to the assumption of linear independence. 

To see why this plan succeeds, suppose we have replaced m — 1 of the wu; 
by v1,...,Um—1, SO Now we want to replace another u; by v,,. Success so far 
means that v1, ...,Vm—1 and the remaining uw; span V, so 


Vin = AV] + +++ + Gm—1Vm—1 + terms bj ui; 


where qj, ...,@m— and the b; are in F.. Since v,, is not a linear combination of 
V{,..-,Um—1 (by their linear independence), some b; # 0. But then, dividing 
by b;, we get u; as a linear combination of v1, ...,v,, and the remaining u;. 
So u; can be replaced by v,,, and we still have a spanning set. 

Thus, the exchange process can be continued until all the wu; are replaced, 
leading to the contradiction foreshadowed above. This proves the exchange 
lemma. oO 


Corollary. If V has a basis over F with n elements, then all bases of V over 
F have n elements. (Vector space dimension is well defined.) 


Proof. Suppose that uy, ...,W, is a basis for V over F’. Then any larger set is 
not linearly independent, by the lemma, and hence not a basis. Similarly, no 
smaller set is a basis either, otherwise w 1, ...,u4, would not be a basis, by the 
lemma again. Oo 


Finite-dimensionality has important consequences when V is an extension 
field E D> F. In this case, as we saw in Section 4.5, if E has dimension n, 
then each x € E is algebraic of degree < n over F’. This is because the n + 1 
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elements 1,x,x2,...,x” are linearly dependent over F', which means that x 
satisfies an equation of degree n with coefficients in F. 

Moreover, if D D E D F are fields, with each extension of finite dimen- 
sion, then we can calculate the dimension of D over F from the dimensions of 
D over E and of E over F.. This was noted by Dedekind (1894) without proof. 
The proof in fact is very simple, underlining the power of the basis concept. 
Since the key concept of relative dimension will be useful later, we introduce 
the following: 


Notation. If F > E is a field extension of finite dimension, we denote the 
dimension of E over F by [E: F]. 


If one does not think in terms of bases, then it is hard to see that the number 
V2 </7, say, is even algebraic. The number af belongs to a field of degree 3, 
hence of dimension 3, over Q, and a7 similarly belongs to a field of dimension 
5 over Q, but what kind of a field contains their sum? The answer is a field of 
degree 15, by the following theorem. 


Dedekind product theorem. /f D > E 2 F are fields, such that [D: E] = 
mand{(E: F]) =n, then[D: F]}=mn=[D: E][E: F]. 


Proof. Let 61, ...,6m be a basis for D over E, so any d € D can be written 
d=e,6, +--:+€mdm forsome ej,...,€) € E (*) 
and let €;,...,€, a basis for E over F,, so any e € E can be written 
e= fier; +--+ fren forsome fi,..., fn € F. (**) 


It follows, since each e; in (*) can be rewritten in terms of the basis elements 
€; as in (**), that the mn elements 6;€; span D over F. 

Also, the elements 6;¢; of D are linearly independent over F’. 

If a sum of terms f;;d;¢; equals 0 with the fj; ¢ F, then we have 


O= (filer +--+ fin€n)d1 
+ (foier + +--+ fon€n)d2 


+ (fmi€1 Fe + Fmn€n)om- 


Since the 6; are linearly independent over E, their coefficients 


fier +--+ + finén = 0. 
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Then, since the € j are also linear independent over F, their coefficients 
fij = 9. This shows that the mn elements 4;¢; are linearly independent, and 
hence they form a basis for D over F. Oo 


It follows that the sum of any two algebraic numbers, a of degree m and 
B of degree n, respectively, is an algebraic number of degree < mn. This is 
because a@ belongs to a field E of dimension m over Q, in which case a + 6 
belongs to the field D = E(f), which is of degree < n over E, and hence of 
degree < mn over Q. (By Section 4.5, any member of a field of degree d over 
Q has degree < d.) By a similar argument, af also has degree < mn over Q. 

Thus, it is quite easy to prove that the sum or product of algebraic 
numbers is an algebraic number. Proving that the sum of algebraic integers 
is an algebraic integer, as in Section 4.6, was not so simple, because it 
invoked the determinant concept. We will need this concept, and other more 
sophisticated parts of linear algebra, in the study of algebraic integers and their 
generalizations. 


6.2.1 Subspaces 


Definition. A subspace U of an F-vector space V is a nonempty subset of V 
closed under vector addition and multiplication by members of F. 


It is easy to see that a subspace U of V is itself an F-vector space. We 
naturally expect that dimension of U < dimension of V, and this is correct. 
Indeed, it is a corollary of the Grassmann exchange lemma. 


Dimension of a subspace. /f U is a subspace of a finite-dimensional vector 
space V, then dimension of U < dimension of V. 


Proof. If V has dimension n, then any n + | vectors in U are also in V, and 
hence linearly dependent by the Grassmann exchange lemma. Skipping the 
trivial case U = {0}, we find a basis of U by the following process. 

Choose any nonzero u; € U. If u; does not span U, choose a uz € U not 
in the span of w,. If u;,u2 do not span U, choose w3 € U not in the span of 
uj,U2, and so on. Then we have an independent set at each stage, because if 
uj,...,Uxz are independent and wj,..., Uz, U4 are not, uz+1 has a nonzero 
coefficient ay4, € F ina relation 


Quy +--- + agug + a4 iUgs1 = 0, where aj,...,ax,ar41 € F, 


which means uy41) = ~ag (ain +.-+++ agux) is in the span of W1,..., Ux. 
Thus, the process must terminate before we get n + | vectors. 
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At termination we have an independent spanning set with < n + 1 vectors, 
so U has dimension < n. Oo 


This proof depends on F being a field, because we divide by ax, in the 
final step. The corresponding statement with a ring R in place of F does not 
hold for all rings R, though it holds for certain rings such as Z, as we will see 
in Section 8.4. We will also see there that the proof can avoid division, but 
at the cost of a certain amount of gymnastics with addition, subtraction, and 
multiplication. 


Exercises 


We can now identify dimension with degree, as foreshadowed in Section 4.5, 
where the degree of an algebraic number a is the degree of the irreducible 
polynomial (which is unique up to constant factors) satisfied by a. 

A famous application of dimension, and the Dedekind product theorem in 
particular, concerns the numbers a obtained from rational numbers by nested 
square roots. 


1. Observing that a = V 1+ V2 satisfies the equation 
a* — (1+ V2) =0, 


with coefficients in Q(Vv2), explain why the field Q(q@) is of degree either 
1 or 2 over Q(V2). 

2. Deduce that Q(q@) is of degree 2 or 4 over Q. 

3. Generalizing the argument in the previous two questions, explain why a 
number f obtained by nested square roots from rationals generates a field 
Q(B) of degree 2” over Q. 

4. Show, from the irrationality of ae that Q(72) is of degree 3 over Q, and 
hence that </2 does not belong to any field obtained by square roots. 


The latter result implies that 2 is not obtainable from rational numbers 
by the rational operations +,—, x,+ and square roots. This answers an ancient 
geometric question called duplication of the cube, by showing that the length 
/2 cannot be obtained from the unit length by a so-called straightedge and 
compass construction. (For more information on the algebra of geometric 
constructions, see for example Stillwell (1994).) 


2 It may not be clear whether is in Q(/2) or not, but it doesn’t matter. Its degree over Q(v2) is 
at most 2. 
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6.3 Linear Maps 


Linear maps are maps that preserve “vector space structure” in the sense of the 
following definitions. 


Definitions. If V and V’ are F-vector spaces, then a map f: V > V’ is called 
linear (or, if we want to emphasize the field of scalars, F-linear) if it preserves 
vector sums and scalar multiples, that is: 


f(v1 + v2) = f(vi) + f(v2) and f(av) =af(v) 


for any v,vj,02 € V,ae F. 


A linear map f: V — V’ is also called a vector space homomorphism. If 
f is also injective (one-to-one) and surjective (onto), then it is called an 
isomorphism or, specifically, a vector space isomorphism. 


A homomorphism (which comes from the Greek for something like “similar 
form’’) also preserves the zero vector and additive inverses. For example 


v+0=v0=> f(v+0) = flv) 
=> f(v) + fO) = fv), by linearity 


=> f0)=9. by uniqueness of zero vector 


An isomorphism (from the Greek for “same form’’) preserves not only sums 
and scalar multiples, but also dimension, as the following theorem shows. 


Invariance of dimension under isomorphisms. /f V has a finite basis, a 
vector space isomorphism f: V — V’' maps a basis of V to a basis of V’, 
necessarily of the same size. 


Proof. If v1,...,U, is a basis of V, then f(v1),..., f(vn) is a basis of V 
because: 


1. Since vj,...,v, span V, we can write any v € V in the form 
v=aqjvjt---+a,v, forsomeady,...,d, € F. 
It follows by linearity that 
fv) = a f(V1) +--+ + anf (Un): 


The latter equation shows that f(v1),..., f(vn) span V’, because f is 
onto V’ and hence any member of V’ is of the form f(v) for some v € V. 
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2. To see why f(v1),..., f(v,) are linearly independent, notice that 
O=a,f(v1) +--+ + anf (vy) for some a),...,d, € F 
=> 0= flav +--+ +4ntn) by linearity 
=> £0) = flav +--+ + avn) since 0 = f (0) 
> 0=a,v, +--+ + andy since f is injective 
>a, =::-=a,=0 since vj,...,V, are linearly independent. 


Thus f(v1),..., f (vn) is a basis of V’, so both V and V’ have dimension n. 
Oo 


Corollary. [f V has a finite basis, a linear map f : V — V is injective if and 
only if it is surjective. 


Proof. If f is injective, then part 2 of the proof above shows f(v1),..., f (Un) 
are linearly independent. If they do not span, then some v € V is not in 
their span, in which case v, f(v1),..., (vn) are independent, contrary to the 
dimension of V being n. So in fact, for any v € V, 


v=ay fui) +---+anf On) = f(aivi +--+ + andy) 


for some a}, ...,d, € F. That is, f is surjective. 
If f is surjective, then part 1 of the proof above shows that f(v1),..., f(Un) 
span V. If f is not injective, then 


f(ayvy +++ +ay0n) = ay f (V1) +-+> + an f (On) = 0, 


for some aj,...,d, € F, not all zero, in which case some f(v;) is a linear 
combination of the other f(v;). So the other f(v;) span V, again contrary to 
the dimension of V being n. oO 


6.3.1 Matrices 


It follows from the calculations in the theorem on invariance of dimension 
that the value of a linear map f on any vector v is determined by its values 
f(v1), ..., f(vn) on a set of basis vectors v),...,U,. Thus, if aj; are the 
elements of F such that 


f(1) = ayy, +--+ + aindp 


Fn) = Gn +++ + anandn 
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then f is determined by the matrix 


GQnl *** Gnn 


We will assume that the basic properties of matrices are known from an intro- 
ductory linear algebra course, starting with the fact that matrix multiplication 
reflects composition of linear maps. It is important to remember, however, that 
the matrix of a linear map depends on a choice of basis. Sometimes we wish 
to use properties of a matrix that are independent of basis, and hence intrinsic 
properties of the corresponding linear map. 

Two basis-independent quantities are the trace and the determinant of a 
matrix A, written as Tr(A) and det(A). Both of these invariants involve only 
the ring operations of sum, difference, and product, as we will see in Section 
7.5. Therefore, they also apply in a setting where we have only a ring in place 
of a field. This is the setting of modules, which we study in Chapter 8. 


Exercises 


The basis-independence of the determinant follows from basic properties 
of matrices and determinants, such as det(AB) = det(A) det(B). They are 
probably familiar from an introductory linear algebra course, so we will 
assume them for this chapter. However, we will discuss determinants more 
fully in the next chapter (and revisit these exercises in Section 7.4), because 
determinants are really too remarkable to be taken for granted. 


1. Show that the change of basis effected by a matrix B causes the 
transformation expressed by matrix A to be expressed by matrix BAB™!. 

2. Show also that det(B)~! = det(B~'). 

3. Hence, conclude that det(B AB!) = det(A). 


6.4 Algebraic Numbers as Matrices 


In Section 4.4 we saw how algebraic numbers are represented “concretely” — in 
terms of rational numbers — as congruence classes of polynomials with rational 
coefficients. Another way to realize algebraic numbers “concretely,” in this 
sense, is by means of matrices. We mention this here because it involves an 
interesting example of a linear map, which we generalize in Section 7.5. 
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A simple example of a matrix that behaves like an algebraic number is 


i= 0 -l 
~\1 0 
Multiplying i by itself we get 


r=) a)=(0 = 


where 1 denotes the 2 x 2 identity matrix. Thus, in the world of 2 x 2 matrices, 
i is a solution of the equation x* = —1, orx* + 1=0. 
Now suppose we have an algebraic number a, with minimal equation 


x" 4an_yx" | +---+ a,x +a9 =0, whereag,...,an,-1€Q. (*) 


Part 3 of the proof in Section 4.4 shows that Q(@) equals the polynomial ring 


Q{a@], which in turn consists of all linear combinations of la,a?,...,a"—} 
with coefficients in Q, since any polynomial is congruent to one of degree less 
than n, mod p(x). For example a” = —ay,_ja"~! —--- — aya — ao. 


Now the operation m, of multiplication by a is a Q-linear map because 


Mg(ru+sv) =rmg(u)+smq(v) foranyr,s € Qandu,v € Qa). 
Thus, mq is determined by what it does to the n-tuple la, a”, att, 
Indeed, when n-tuples are written as column vectors, my sends 


1 a 
a a 
a2 ne a? 
ql =a n—-1 _ _ _ 
n—1Q “++ — a\a — ao 


This allows us to view multiplication by a as the matrix 


0 1 OO: 0 0 
0 0 | 0 0 
a= : ; 
0 0 0 0 1 
—a0 —G4, —42 ++: ~adn—-2— —4n-1 


2 


because left multiplication of the n-tuple (1, a, w?,...,a”~!) by w yields the 


required n-tuple (a, a?,03,..., —a,_ja"—! —--- —aja — ag). 
It follows, since composition of my with itself corresponds to multiplication 


by powers of a, and scalar multiples and sums of powers correspond to matrix 
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scalar multiples and sums, that matrix polynomials in w behave exactly the 
same as polynomials in a. In particular, 


1 


oe” + dnp" +--+ aja+agl1=90 (the zero matrix). 


For example, the (transpose of the) matrix 


_ (0 2 
same, (7 


for which ay = —2, ought to satisfy the equation x7 — 2- 1 = 0. And indeed, 


0 2\/0 2 2 0 
2 = =—?2. 
= (1 a) a)=(0 2-7" 
Notice that the matrix a has rational entries if w is an algebraic number, 
and indeed, integer entries if w is an algebraic integer, because the minimal 


equation (*) for aw has integer coefficients in the latter case, by the corollary 
in Section 4.7. In particular, we can represent the number V2 by the integer 


matrix 2 
1 O} 


Exercises 


Representation of algebraic numbers by matrices allows us to approach 
their norms via the concept of determinant. From this point of view, the 
multiplicative property of norm follows from the multiplicative property of 
determinant: det(A B) = det(A) det(B). 


1. Show that the complex number a + bi (where a,b € Q or R) is represented 
by the matrix 


—b 
(; ) , whose determinant equals an ge la+ bi ee 
a 


2. Show that the element a + b /2 of Q(v2) is represented by the matrix, 
with a,b € Q, 


2b 
€ ) , whose determinant equals sor, 
a 


3. Conclude that the norm a* — 2b? in Q(v2) is multiplicative. 
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4. Using the matrix for multiplication by 2'/3 in Q(2!/3), show that the 
element a + b- 2'/3 + ¢ - 27/3 of Q(2!/3) is represented by the matrix 


a 2c 2b 
b a 2 
c boa 


5. Hence, show that the norm of a + b- 2!/3 + ¢. 22/3 is 
a? + 2b? + 4c3 — 6abe. 


The multiplicative property of det is remarkably powerful, as can be seen in 
the case of the quaternion matrices 


a+ bi i) 


saad cea (A di a—bi 


whose determinant equals CL Le 4a 


6. Writing each quaternion matrix in the form 


( = a where a9 = a+ bi and B =c+di, 
—B @ 
check that the product of quaternion matrices is a quaternion matrix. 

7. Deduce, because det(Q 1) det(Q2) = det(Q Q2), that there is a four 
square identity, expressing the product (at + be + c + d?) 
(a5 + b5 + c + @) as a sum of four squares. Such an identity was first 
discovered by Euler (1748). 


6.5 The Theorem of the Primitive Element 


If w is an algebraic number of degree n, we know that the field Q(q@) is of 
dimension n over Q, with basis l,a,...,a” —! There is a converse to this 
result, called the theorem of the primitive element. It states that if E > Q is 
an extension of dimension n, then E = Q(a@) for some algebraic number a of 
degree n, and hence E is a vector space over Q with basis l,a,... att, 

To prove the theorem, it is convenient to use derivatives of polynomials in 
F [x], where F is an algebraic number field. 


Definition. For any polynomial p(x) = a,x" + Gn—1x"~* ++» + a,x +a0, we 
define its derivative to be p!(x) = nanx” + (n — Day_x") +--+ ay. 


Thus, the derivative p’(x) is the same polynomial one finds by calculus 
when p(x) is in R[x] or C[x]. In algebra the derivative of a polynomial is 
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introduced by definition, because we may be working over a field F where 
the processes of calculus do not make sense. Fortunately, we can also give 
algebraic proofs of the usual rules of differentiation, as long as they are 
restricted to polynomials. For example, we can prove the product rule: 


For polynomials p(x),q(x),r(x), if p(x) = q(x)r(x), then 
P(x) = q'(x)r(@x) + q(x)r'(x). 


This is done by substituting general polynomials for q(x) and r(x), then 
working out both sides of the equation according to the definition. 

The derivative p’(x) is involved in deciding whether a polynomial p(x) has 
a multiple root. Since F is an algebraic number field, we can assume the roots 
of any polynomial in F'[x] exist in C, by the fundamental theorem of algebra. 


Multiple root criterion. A polynomial p(x) € F[x] has a multiple root if and 
only if gcd( p(x), p'(x)) # 1. 


Proof. If a is a root of p(x), then p(x) = (x — a)q(x) for some polynomial 
q(x). Now notice that p’(x) = q(x) + (x — a)q'(x) by the product rule. 
Consequently, 


a is a multiple root of p(x) @ x — a divides q(x) 
<> x — a divides p’(x) — (x — a)q'(x) 
<> x —a divides p'(x), by the product rule 
> x —a divides p(x) and p’(x). 


So, if p(x) has a multiple root in C, then gced(p(x), p’(x)) # 1, because p(x) 
and p’(x) have a common divisor x — a for each multiple root a. 

Conversely, if gcd(p(x), p’(x)) = 1, then there is no common divisor x —a, 
hence no multiple root. Oo 


Corollary. An irreducible p(x) € F(x] has no multiple root. 


Proof. By the multiple root criterion, we must show gcd(p(x), p’(x)) = | for 
an irreducible p(x). Since p(x) is irreducible, it can have a common divisor 
with p’(x) only if p(x) divides p’(x). This is impossible, since p’(x) has lower 
degree, unless p’(x) = 0. 

But, looking at the definition of p’(x), for a nonconstant polynomial p(x), 
we see that p’(x) # 0. Oo 


Now, finally, we are ready to prove the theorem. The condition E C C 
enters to ensure that irreducible polynomials have distinct linear factors. 
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Theorem of the primitive element. /f E > Q is an extension of finite degree, 
and E CG, then E = Q(a@) for somea € E. 


Proof. If E is of degree n over Q, then E = Q(q),...,a@,) where a1,...,Qy, 
are basis elements for E over Q. We argue by induction on n. Ifn = 1, 
then the theorem holds with a = a . For the induction step we can suppose 
Q(@1,..-,@n—1) = Q(B) for some 6 ¢€ E, and it remains to find @ so that 
QB, an) = Q(a@). 

Writing a, as y, our problem boils down to the following: given 6, y that 
are algebraic over Q, we wish to find a so that Q(6, vy) = Q(@). 

We seek an element a € Q(6,y) of the form a = y + cB where cc € Q. It 
then suffices to show that 8B € Q(a), because then y = a — cB € Q(@) too. 
To do this we will show that the minimal monic polynomial for 6 in Q(q@) is 
x — B. This comes from looking at 


g(x) = minimal monic polynomial for 6 over Q, 


h(x) = minimal monic polynomial for y over Q. 


We initially work in C, which contains all the roots of g(x) and h(x). Then, 
since roots correspond to linear factors, over C we have 


8(x) = (x — Bi) --- @ — Bm) 


where 6), ..., 8m are the solutions of g(x) = 0, 


h(x) = (x — y¥1)---(* — yw) ~where yj,...,, are the solutions of h(x) = 0. 


Now g(x) has 6 among its roots £1, ..., 8m by definition. Similarly, for c # 0, 
h(a — cx) also has the root 6, because y = a — cB, and y is among the roots 
V1, -++Yn Of h(x). This suggests that we can find the polynomial x — £ as the 
greatest common divisor of g(x) and h(a — cx), provided we can choose c so 
that 6 is the only common root of g(x) and h(a — cx). 

Without loss of generality we can assume B = 6; and y = yy. Then, since 
g(x) and h(x) have no multiple roots by the corollary above, we have 6; # 6 
when i # | and y; # y; when j # 1. Since a = cB + y, it suffices to ensure 
that 


chi+m#chit+y; fori#1,j#l. 


iA ham cl 


This means that c # BiaBi’ 


because Q is infinite. 
Thus, we can indeed choose c so that f is the only common root of g(x) 
and h(a — cx), which means that 


x — B = ged(g(x),h(w — cx)). 


and we can avoid these finitely many values 
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Now g(x) € Q[x] and h(a — cx) € Q(a)[x], because c € Qanda € Q(a). 
So in fact both polynomials are in Q(@)[x], and hence the Euclidean algorithm 
for their gcd gives a polynomial with coefficients in Q(a). Thus, B € Q(a@), as 
required. Oo 


Exercises 


A field whose primitive element can be found directly is Q(v2, V3), obtained 
from Q(V2) by adjoining the element /3. We begin by recalling from 
exercise | of Section 4.5 that 3 is not in Q(V2). 


1. Deduce that Q(v2, V3) is of dimension 2 over Q(v2), and hence of 
dimension 4 over Q. 

2. Show, by rational operations on a = /2+ V3 (including division), that 
V2 €Q(v2+ V3), and hence that V3 € Q(V2 + V3). 

3. Conclude that /2+ /3isa primitive element for Q(V2, V3) , and hence 
that /2 + 4/3 is of degree 4. 

4. Show that /2+ 3 satisfies the polynomial x+ — 10x? + 1. 

5. Deduce that x+ — 10x? + 1 is irreducible in Q[x]. 


6.6 Algebraic Number Fields and Embeddings in C 


Now that we know that each algebraic number field EF has the simple form 
Q(a@), we can relate the possible realizations of E in the complex numbers C 
to the roots of the minimal polynomial p(x) for a. We saw in Section 4.4 that 
for each root a; of p(x) there is an isomorphism o;: Q(a@) — C sending a 
to a;. We know from Section 6.5 the roots of p(x) are distinct, so if E is of 
degree n, there are n distinct isomorphisms o;. 

Conversely, any isomorphism o : Q(a) — C is determined by the value of 
o (a), which must be a root of p(x) because 


0=0(0) = o(p(@)) = p(o(@)) _ since o is an isomorphism. 


Thus, o(a@) = some a;, in which case o = oj. 

We will call the isomorphisms of a field E into C its embeddings in C. 

We are interested in the field F = Q(f) generated by an element 6 € E = 
Q(a@), in which case we have fields E > F D Q. One thing we already know 
about this situation is the Dedekind product theorem of Section 6.2, which tells 
us that the relative dimensions satisfy 


[E: Q)=[E: FI[F: QI. 
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An immediate consequence is that the degree [F': Q] of 6 divides the degree 
[E: Q] = n of a. More generally, the conjugates 6 are related to the values 
0; (f) as follows: 


Conjugates of B € Q(a@). The values o;(B) are the [F: Q] conjugates of B, 
each occurring [E: F] times. 


Proof. Suppose g(x) is the minimal polynomial for 6, with degree 
m = [F: Q]. Then we know that there are m embeddings t;: F — C by 
Section 4.4. Next observe that each of these embeddings t; can be extended to 
an embedding 0: E — C by viewing E as a vector space over F. 

In fact, viewing E as F(y), by the theorem of the primitive element, we 
see there are exactly n/m such extensions of each t;. This is because y is of 
degree n/m, by the Dedekind product theorem, so y can be sent to any of its 
n/m conjugates by an extension of T;. 

This gives n embeddings E — C, which must be the n embeddings o;. 
It also shows that each of the m distinct values t;(8) occurs [E: F] = n/m 
times among the n values o;. oO 


Corollary. The minimal polynomial q(x) of B satisfies 
qx)F) = (x — 01(B)) ++ — on(B))- 


Proof. Each side is a monic polynomial of degree n with roots 01 (8), ...,0n(B). 
Oo 


Exercises 


1. Show, by finding the roots of x3 = a? where a = 2!/3, that the conjugates 
of 2!/3 are 2!/3¢ and 2!/3¢?, where 


-1+ /-3 


f=—*— 


2 


2. Deduce that the general element a + b- 23 4. ¢.22/3 of Q2!/3) has the 
conjugates 


pbs DP Peso and: a4 be Pr Hes OAe 
3. Show that the conjugates of /2 + /3 are 
Ji- V3, -Vi4V3, -VI-N3 


by showing that all these numbers satisfy the same minimal polynomial. 
4. Hence, find the conjugates of the general element 


a+bJ/2+cV3+d V6 of Q(v2, V3). 
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6.7 Discussion 


Linear algebra is an ancient subject that was not taken seriously until modern 
times. It originated as a discipline for solving linear equations and, since linear 
equations are the easiest to solve, linear algebra remained on the bottom rung of 
algebra as long as its main aim was thought to be solving equations. It was only 
when mathematicians turned their attention to structures, such as rings, fields, 
and vector spaces, that linear algebra was seen in a new light: as a necessary 
foundation for many parts of algebra, geometry, and analysis. 


6.7.1 Linear Equations 


The first important discovery in linear algebra was the algorithm for solving 
systems of linear equations we call “Gaussian elimination.” It was invented 
not by Gauss, but by Chinese mathematicians about 2000 years ago. Precisely 
when, we do not know, but it appears in the Nine Chapters of Mathematical Art, 
a famous compilation that was required reading for examination candidates 
during the Han dynasty (from around 200 BCE to 200 CE). 

As most readers will know, “Gaussian elimination” displays a set of n 
equations in n unknowns in a square array, such as 


x+y+z=l1 
2x +y+3z=0 
x—-—y+2z=4, 


and subtracts multiples of equations from one another to reduce the system to 
triangular form. Here we remove the x term from the last two equations by 
subtracting suitable multiples of the first equation from them, obtaining 


x+ty+z=1 
-y+z=-2 
—2y +753. 


Then we subtract twice the second equation from the third, obtaining the 
triangular system 


x+ty+z=1 
—y+z=-2 
-z=7. 


Finally, we obtain the values of z, y, x in that order by working from the bottom 
up: z = —7, hence y = —5, hence x = 13. 
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The reduction to triangular form involves only the matrix of coefficients 


1 1 1 
2 1 3 Of, 
1 -1 2 4 
and indeed the Chinese carried out the process in exactly this way, representing 
the matrix of numbers by counters on a board. 

Thus, it has been clear since ancient times how to solve n linear equations 
in n unknowns. A more sophisticated question is whether there is a formula 
expressing the unknowns in terms of the coefficients and known operations. 
For small values of n such formulas were found by the Japanese mathematician 
Seki in 1683 and Leibniz in 1693. By 1708 Seki had found the underlying 
concept of determinant and defined it inductively. In Europe the general 
determinant concept was rediscovered by Cramer (1750) and used by him 
to give a general formula for the solution of linear equations. These nearly 
simultaneous discoveries underline the “rightness” of the determinant concept 
for the theory of linear equations. 

In the next chapter we will see that the determinant is indeed the “right” 
concept for explaining solvability of systems of linear equations. However, 
we will also see that the determinant is quite a complicated object. Thus, 
Cramer’s formula sent linear algebra down the difficult path of determinant 
theory, diverting it for more than a century from the simpler concepts of linear 
independence, basis, and dimension. 


6.7.2 Linear Spaces 


As the term “dimension” suggests, linear algebra can be viewed geometrically, 
particularly when the base field is R. The one-dimensional R-vector space is 
the line, the two-dimensional R-vector space is the plane, and so on. This idea 
was generalized to arbitrary n by Grassmann (1844). In one stroke he created 
an n-dimensional geometry in the form of the n-dimensional R-vector space. 
The idea of vector space was already novel, and Grassmann loaded it with 
the extra novelties of inner and outer products (of which the determinant was 
merely a special case). Add to this an obscure style of exposition, and the 
result was a book that Grassmann’s contemporaries were completely unable to 
understand. 

Such was the difficult birth of modern linear algebra. Grassmann (1862) 
made a second attempt, stripping down his idea to the minimum needed for 
n-dimensional Euclidean geometry: the vector space R” of real n-tuples with 
the inner product 
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Figure 6.1 Seki Takakazu (1642-1708) and Gottfried Wilhelm Leibniz (1646— 
1716). Courtesy of the Japan Academy and licensed under Creative Commons 
Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY- 
NC-ND 4.0), respectively. 


Figure 6.2 Hermann Grassmann (1809-1877) and Giuseppe Peano (1858-1932). 
Licensed under Creative Commons Attribution-ShareAlike 4.0 International 
License. 


(X1,X2,.--,Xn) * (V1, 25 +++ Yn) = X11 + X22 +++ + XnYn- 


Grassmann even pointed out that the inner product is essentially equivalent 
to the Pythagorean theorem, but it didn’t help. His style was such that few 
mathematicians had the patience to read him. So his ideas — which included 
even basics such as invariance of dimension — languished until Peano (who 
also rescued Grassmann’s ideas on arithmetic) published axioms for vector 
spaces in Peano (1888). 
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Peano’s vector space axioms were the same as those we use today, except 
that he specialized to the field F = R. The idea of varying the field grew from 
the work of Dedekind, in the Supplements he wrote for his editions of Dirichlet 
in 1871, 1879, and 1894. In Dedekind’s vector spaces F can be any algebraic 
number field. The fully fledged notion of F'-vector space had to wait until the 
concept of field itself became established in full generality, which happened in 
the paper of Steinitz (1910). 


6.7.3 Linear Maps 


In algebra today it is commonplace to say that we study not only certain 
kinds of structure — such as rings, fields, or vector spaces — but also the 
structure-preserving maps between them: their homomorphisms. However, 
this is a fairly recent realization. In the evolution of an algebraic concept, the 
appropriate notion of homomorphism is often last to emerge, if only because 
we need to know precisely what the structure is before we can talk about 
preserving it. 

This is certainly how the concept of linear map emerged. Particular 
linear maps have been around for a long time; for example, in the form of 
substitutions or changes of variable, like those used by Lagrange to study 
quadratic forms (see Section 3.3). But linear maps were not considered as 


Figure 6.3 Arthur Cayley (1821-1895). Licensed under Creative Commons 
Attribution-ShareAlike 4.0 International License. 
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mathematical objects until the 19th century, when the matrix notation of 
Cayley (1855) gave the object a symbolic existence. 

Matrices quickly took on a life of their own, as Cayley moved to consider 
the sum and product of matrices and the matrix inverse. As Katz and Parshall 
(2014), p. 358, wrote 


Cayley’s paper represented a significant step in the evolution of algebra, for it 
showed how matrices could be understood as constructs, subject to many of the 
usual operations of algebra. 


But matrices are not the last word on linear maps. The matrix for a linear map 

of a vector space V depends on the choice of basis for V and, ideally, one 

would like to discuss linear maps without choosing a basis. The definition of 

linear map in Section 6.3 suggests that basis-free linear algebra is possible. 

Modern books, starting with Halmos (1942), have indeed taken this approach. 
However, there is still something to be said for matrices: 


We share a philosophy about linear algebra: we think basis-free, we write 
basis-free, but when the chips are down we close the office door and compute with 
matrices like fury. 

(Irving Kaplansky, writing of himself and Halmos in Kaplansky (1991)). 


https://doi.org/10.1017/97810090041 38.008 Published online by Cambridge University Press 


7 


Determinant Theory 


Preview 


The determinant function is a concept of linear algebra frequently skimmed 
over or minimized in modern treatments of the subject. At the same time, 
books on algebraic number theory tend to assume sophisticated properties of 
the determinant — such as its relationship to trace, norm, and characteristic 
polynomial — to be already known from a basic linear algebra course. 

Under these circumstances it seems useful to develop the determinant 
concept from scratch and then transition to its applications in algebraic number 
fields. This is what we aim to do in this chapter, hopefully making the book 
more self-contained without greatly increasing its size. 

We begin with an elementary treatment of determinants, due to Artin 
(1942). His approach leads rapidly to methods for evaluating determinants, 
applications to linear equations, and to the all-important multiplicative 
property. 

We can then prove the invariance of the determinant under change of 
basis and deduce the basis-independence of trace, norm, and characteristic 
polynomial. This leads in turn to relations between the trace and norm of an 
algebraic number and the roots of its minimal polynomial. 

With these foundations we can then introduce the discriminant, which tests 
whether an n-tuple of members of a field F of degree n over Q is a basis for F. 
This paves the way for the study of integral bases in the next chapter. These 
generalize the concept of basis from vector spaces to certain kinds of modules, 
such as the algebraic number rings Ze. 


149 
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Figure 7.1 Emil Artin (1898-1962) (used with permission of Tom Artin and his 
siblings). 


7.1 Axioms for the Determinant 


There are many ways to define the determinant function and derive its basic 
properties, none of them completely straightforward. One that is elementary, 
yet algebraic in spirit, is given in Artin (1942), pages 12-20. We follow 
Artin’s approach in this section and the next, except that we assume only that 
matrix entries are from a ring R, because only ring operations are involved. In 
Section 7.3 we specialize to matrices with entries in a field or a domain. 

The first step is to impose certain requirements, or axioms, on a function 
D(A) of n x n matrices A that we expect the determinant to have. The 
axioms are chosen to be as simple as possible, yet capable (as we will see) of 
implying all properties of the determinant function. To state the axioms, we let 
Aj,...,A, denote the columns of A, and view D as a function D(A, ..., An) 
of its columns. We write D(..., Ax, ...) or D(..., Ak, Ak+1,---) when D is 
viewed as a function of Ax, or of Ax and Ax+1, with the other columns held 
constant. 


Axiom 1. D is linear on each column Ax;; that is, 
D(..., Ag + Aj...) = DO.., AR.) + DG... AR ed) 
and 
D(...,cAg,...) =cD(...,Ax,...) foreachc € R. 


Axiom 2. D(..., Ax, Ak+1, ---) = 0 when Ax = Ax41. 
Axiom 3. D(/) = 1, where J is the n x n identity matrix. In terms of columns, 
D(M,...,Un) = 1, where U1, ...,U, are the columns of /. 
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For the moment we are interested only in what follows from these axioms. 
When enough consequences are known, we will know exactly what D looks 
like, and its existence will be easy to prove. 

One after another, we derive the following consequences. 


1. Putting c = 0 in Axiom 1, we get D = 0 when any column is zero. 
2. By Axiom I, if we add cA;,+1 to column k we get 
D(..., An + cAg41, Aki, ---) = D(..., Ak, Akti, ++) 
+ cD. v8 Ans, AKI ae .) 
= D(...,Ak, Ak+1, +--+); 


because D(..., Ag41, Ak41, ---) = 0 by Axiom 2. Thus, the value of D is 
unchanged if we add a multiple of one column to another. 

3. By adding or subtracting (take c = —1) one column from another, we can 
make the following changes in columns k and k + 1. 


Ag, Apt b> Ag, Akti + Ak adding column k to column k + 1, 


t> —Agy1, Ag41 + Ax subtracting column k + 1 from column k, 


t> —Apiy, Ax adding column k to column k + 1. 


Thus, D is unaltered by this change, by consequence 2. 

4. Multiplying the new column k by —1, which reverses the sign of D by 
Axiom 1, we obtain Ag+1, Ax in columns k and k + 1. Thus, swapping 
adjacent columns reverses the sign of D. 

5. Any two columns Ax and A; can be made adjacent by a sequence of swaps 
of adjacent columns. So if A, = Aj), then D = 0 by Axiom 2. 

6. By adjacent swaps and applying Axiom 2 as in consequences 2 and 3, we 
can similarly show D is unchanged by adding a multiple of any column to 
another, and D’s sign is reversed by swapping any two columns. 

7. Any permutation Ag 1), ...,Ag(n) of the columns Aj, ..., An is obtainable 
by a sequence of swaps, so 


D(Ac); tee ,Aa(n)) = £D(Aj,...,An). 


In particular, by Axiom 3, 


D(Ueq),---,Uo~n)) = +1, since D(U,...,Un) = 1. 


With these facts we can compute the effect on D(A, ...,A,) of replacing 
each A, by an arbitrary linear combination of columns: 


Aj = by Ay +---+bn~An, Where biz,...,bnn € R. 
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The idea is to expand D(A{,...,A/,) into a sum of terms +D(Aj,..., An), 
each multiplied by a product of constants b;;, using Axiom 1 to break 
up Aj, Aj,... in turn. At first we get terms D(Aj,,...,A;,) with arbitrary 
subscripts iz. Those with two equal subscripts vanish by consequence 5; the 
others get a + sign depending on the swaps needed to put the subscripts in the 
order 1,2,...,n. 

The process can be illustrated in the case n = 2, where 


D(A}, Ay) = Db, At + b12A2, Ad) 
= bi D(Aq, Ad) + b12D(A2, Ad) 
= bi, D(A1, bai At + bo2A2) + b12(A2, bai Ai + b22A2) 
= bi1b21 D(A1, A1) + bi1b22D(A1, Ar) 
+ bi2b21 D(A2, A1) + bi2b22 D(A2, Az) 
= b11b22 D(A}, Az) + b12b21 D(A2, A1) 
= bi1b22 D(A), A2) — bi2b21 D(A1, A2) 
= (b11b22 — b2b21) D(A1, Az). 


In the general case D(A\, ...,A/,) is the sum of terms 


£)16(1)b20(2) *** Pao(nyD(Ao(i); «+» Ao(n))s 


over all permutations o of 1,2, ...,n, because D is zero on any sequence of n 
columns A; with a repeated column. It follows, by swapping columns A; until 
they are in order, that 


D(Aj,...,4h) = I +D(AL,...,An)Pioyb202)** Prony, *) 


Oo 


in which each sign (+ or —) is determined by the permutation o. 

The formula (*) was derived from Axioms | and 2 alone. If we now take A 
to be the identity matrix and assume Axiom 3, then D(A1,...,A,) = 1 and 
also Ai, = Bx, the kth column of the matrix B of the coefficients b;;. This 
gives a formula for D of an arbitrary matrix B: 


D(Bi,...,Bn) = 7 £16 (1)b26(2) ++ * Pno(n)s (**) 


oO 


So, if D exists, it is uniquely determined by Axioms 1, 2, and 3. 
In the light of the formula (**), formula (*) becomes 


D(A\,..., A) = D(At,...,An)D(B1, ..., Bn). (#8) 


Thus D has the multiplicative property, D(AB) = D(A)D(B), since the 
matrix A’ with columns Aj, is none other than the product of the matrices A 
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and B: the equations expressing the A, in terms of Aj,...,A, specifically 
give 


jth entry in column Ai, = bigajy + b2Raj2 + +++ + Dnkajn, 


which is the product of the jth row of A with the kth column of B. 


Exercises 


The terms 2+D(Aq,...,An)Dic(1)b202):::bnoin) Whose sum _ is 
D(A\,,..., A’,) can be described in a little more detail. 


1. Prove by induction on n that, for each permutation o of 1,...,, the term 
b1¢(1)b20(2) ++ * Pno(n) OCCUrS exactly once in the expansion of 
D(A\,...,Ai,). 


It is of particular interest to see how the sign of Djo(1)b20(2) - ++ Dno(n) 
depends on o. To do this we distinguish between “even” and “odd” permuta- 
tions 0. A permutation o is called even if it has an even number of inversions; 
that is, an even number of pairs i < j such that o(i) > o(/). Otherwise, o is 
called odd. 


2. Why is the identity permutation o (7) = i even? 

3. Show that interchange of o (i) and o (j)—a transposition—changes the 
permutation o from even to odd or else from odd to even. 

4. Show that any permutation is obtainable from the identity permutation by 
a finite sequence of transpositions. 

5. Conclude that o is even if and only if it is a product of an even number of 
transpositions. 

6. Show that the sign of bj, (1)b2¢(2) +++ Phnom) 18 + if o is even and — if not. 


7.2 Existence of the Determinant Function 


If A is ann Xn matrix, we prove existence (and hence uniqueness) of a function 
D(A) satisfying Axioms 1, 2, and 3 by induction on n. For n = 1 we have 
A = (aj), and it is easy to check that D(A) = aj satisfies Axioms 1, 2, 
and 3. 

Now suppose that D exists for all (7 — 1) x (n — 1) matrices and consider 
ann x n matrix A, with (i, j) entry a;;. Define the cofactor Aj; of a;; to be D 
of the (7 — 1) x (n — 1) submatrix of A obtained by deleting row i and column 
j, with a + sign determined by the checkerboard pattern 
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- = =F 
— + =i 
+ = = 
For any i among 1, ..., consider the function 


D(A) = aj, Ai + 4j2Aj2 + +++ + GinAin, 
viewed as a function D(A,,...,A,) of the columns of A. 


Determinant properties of D. D satisfies Axioms 1, 2, and 3; hence it is the 
unique determinant function and is independent of i. 


Proof. If Ax is replaced by cAx, then ajz is replaced by cajz, but Aj,z is 
unchanged because it does not involve the column Ax. The remaining A;; do 
involve Ax, hence they are multiplied by c, while a;; is not. Hence, 


D(...,cAx,...) =cD(...,Ak,.--)- 
A similar argument shows that 
D(..., Ax + Aj, -.-) = D(.., Ak.) + DO.., AL), 


so D satisfies Axiom 1. 

Next, if A, = Ax4+1, then Ajx and Aj,x41 are D of the same matrix but with 
opposite signs, while the other A;; have two equal columns and hence are zero 
by consequence 5 in the previous section. Thus, D = 0 when Ag = Ajz41, SO 
Axiom 2 is satisfied. 


Finally, if A is then x n identity matrix, then aj; = 1 and its cofactor Aj; is 
D of the (n — 1) x (n — 1) identity matrix, hence A;; = | by induction. Every 
other aj; = 0, so D(A) = | and Axiom 3 is satisfied. Oo 


7.2.1 Further Rules for Computing Determinants 


From now on we will call D by its usual name det, for determinant. 

The formula (**) in Section 7.1 in principle gives a rule for computing det, 
though it is not very practical unless many entries are zero or if one wants only 
a particular term, such as the product b,,b22--- by, of the diagonal elements. 
The latter is actually the case when we study the characteristic polynomial in 
Section 7.4. 

The above definition of det via cofactors, 


det(A) = aj, Aj, + aj2Aj2 +--+ + Gin Ain, 
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gives a way to compute det inductively by expansion on rows, as it is called. 
It reduces computation of det(A) to computing the determinants A;; of smaller 
matrices, and hence ultimately to det of 1 x 1 matrices. 

Anything done with rows can also be done with columns, because the 
formula (**) in Section 7.1 shows that interchange of rows and columns makes 
no change in the value of det. Namely, since the + or — sign depends only on 
the permutation o, 


D(Bi,..., Bn) = by +) 1¢(1)b26(2) +++ Pno(n) = > +b6(1)196(2)2°** Potn)n> 


Oo o 


because the second sum is simply a rearrangement of the first, putting the 
second subscripts in increasing order instead of the first subscripts. 
It follows in particular that expansion on column i is valid: 


det(A) = a4; Aj + a2; Aa + +++ + ani Ani 


It also follows that the matrix A‘ obtained by interchanging rows and columns 
of A—the transpose of A—has the same determinant as A. 

It must be confessed that all of these methods are hopelessly inefficient in 
practice, but they suffice for our theoretical purposes. Practical methods for 
computing determinants are related to practical methods for solving systems 
of linear equations, which readers have probably seen in introductory linear 
algebra courses. (In particular, we can mimic “Gaussian elimination” to reduce 
the computation of the determinant to that of finding the determinant of a 
triangular matrix, which is simply the product of its diagonal elements.) 


Exercises 


First, a couple of interesting particular computations: 


1. Prove that 


abe 
det} b c al =3abc—a°—b>- Cc’. 
c a b 
2. Prove that 
laa 
det} 1 b b?] =(a—b)(b—c)(c—a) 
c ce 


by expansion on the first column. (We take up a generalization of this 
result in Section 7.6.) 
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3. How could one anticipate the factorization (a — b)(b — c)(c — a) without 
expanding the determinant? 


Next, a useful general fact about multiplying row elements by the “wrong” 
cofactors. Suppose that A is the matrix with entries a;; and let A’ be the result 
of replacing row j of A by row i. 


4, Explain why det(A’) = aij Aj, + +++ + dinA jn, where the A jx are the 
cofactors of a jx in A. 

5. Deduce that aj; Aj1 +--+ + GinAjn = 0 when j #i. 

6. State and prove the corresponding fact about multiplying elements of a 
column by the “wrong” cofactors. 


7.3 Determinants and Linear Equations 


The results of the last two sections, including the rules for calculating det and 
its multiplicative property, are valid for matrices with entries in any ring. In the 
present section we specialize to matrices with entries from a field F’, because 
we wish to study linear dependence. Thus, the columns Aj,...,A, of ann xn 
matrix A are now vectors in the n-dimensional space F”. 

Under these conditions we have the following criterion for det(A) = 0. 


Criterion for zero determinant. /f A is ann x n matrix with entries from a 
field F, then det(A) = 0 ifand only if the columns A,,..., An of A are linearly 
dependent. 


Proof. Suppose first that Aj,...,A, are linearly dependent. Then some 
column Aj, is a linear combination C; of the others, in which case 


det(..., Az, ...) = det(..., Ax — Cy, ...) = det(...,0,...) =0 


by the consequences of Axioms 1, 2, and 3 in Section 7.1. 
To prove the converse, first observe what happens if every vector V € F” 
is a linear combination of Aj, ...,A,. Then, in particular, 


U; = bj Aj +--+ +tbnjAn, fori=1,...,n, 
where the U; are the columns of the identity matrix /. In that case, 
1 = det J = det(A) det(B), where B is the matrix of the b;;, 


by the multiplicative property of det. Hence, det(A) # 0. 
Now suppose Aj, ...,A, are linearly independent and consider any vector 
V € F”. Then +1 vectors Aj,...,An,V € F” are linearly dependent by the 
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exchange lemma of Section 6.2, so there are x1,...,x,,y € F, not all zero, 
such that 


Ayxy t-++ + Anx%y, + Vy = 0. 


In particular, y # 0, otherwise Aj,...,A, are linearly dependent. This means 
we can write an arbitrary V as a linear combination of Aj,..., An: 
1 
V= vy (Ayx, +++-+Anxn). 
Then, as we have just shown, det(A) # 0. Oo 


Corollary. A set of linear homogeneous equations with matrix A 
QixX, +--+ + 4inX, =0, fori =1,...,n, (*) 


has a nonzero solution if and only if det(A) = 0. 


Proof. This set of equations is equivalent to the vector equation 


Ajxi +---+Anxn = 9, 


where Aj,...,A, are the columns of A. So the existence of a nonzero 
solution is equivalent to linear dependence of Aj,...,An, and hence to 
det(A) = 0. Oo 


This corollary is the property of det assumed in Section 4.6, in order to 
prove that algebraic integers are closed under sum, difference, and product. 
We do not need more general applications of determinants to linear equations, 
but in view of their general importance, we will look at them in the exercises 
below. 


Remark. The corollary extends to equations with coefficients in a domain R, 
(which was the case in Section 4.6) by embedding R in its field of fractions in 
order to carry out the argument about linear independence. So the conclusion 
that det(A) = 0, where A is the matrix of coefficients, remains valid. 


Exercises 


We now generalize the equations (*) in the above corollary to the general set 
of n linear equations in n unknowns, 


ajy\xX1 +++++ dinky = 5), fori =1,...,n, (*) 
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This set is equivalent to the vector equation 


by 
Aix, +--++Anx, = B, whereB=| : |, 
bn 
and Aj,...,A, are the columns of the matrix A with entries a;;. 


1. Explain why there is a solution of (**) for any B if and only if det(A) # 0. 


Now we will solve the equations (**) by choosing any k and multiplying 
the ith equation by the cofactor Ajx of ajz, so that it becomes 


Gi Aikx, ++++ + din Aikxn = Dj Aik. 
2. Summing the latter equations for i = 1, ...,n, show that 
coefficient of x, = ajp Aig +--+ + nk Ank = det(A). 
3. Also show, with the help of exercise 5 in Section 7.2, that 
coefficient of x; = a,j; Aix +++: + 4njAnk =0, when j #k. 


4. Deduce that det(A)x,; = b, Aig +--+ +b, Anx, and that the right-hand side 
is the determinant of the matrix A’ obtained by replacing the kth column of 
A by B. 


The formula for solving the equations (**) implicit in exercise 4 is called 
Cramer’s rule, after its appearance in Cramer (1750). 


Figure 7.2 Gabriel Cramer (1704-1752). Courtesy of the Bibliotéque de Genéve. 
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7.4 Basis Independence 


If a linear map f has a matrix M with respect to a certain basis, and if B is the 
matrix for the linear map sending the old basis to a new one, then the matrix 
for f with respect to the new basis is BM B~!. (Reading from right to left, 
BM B' says “map the new basis to the old one, do M, then map the old basis 
back to the new.’) 

A function g of M is said to be basis-independent, and hence dependent 
only on the underlying linear map f, if g(BMB~') = g(M) for any basis 
change matrix B. An example of basis independence that we have already 
seen, in the exercises to Section 6.3, is that of the det function. We revisit this 
proof now. 


Basis independence of det. /f B is a basis change matrix, then 
det(BM B~') = det(B). 


Proof. Since BB~' = 1, taking det of both sides and using the multiplicative 
property gives 


1 = det(/) = det(BB™!) = det(B) det(B™!). 
This implies det(B~!) = det(B)~!, and it follows that 
det(BMB—') = det(B) det(M) det(B™~!) 
= det(B) det(B)~! det(M) = det(M). o 


A variation of this argument gives basis independence of the characteristic 
polynomial of a matrix M, charpoly(M), which is defined to be det(xJ — M), 
where / is the identity matrix. 


Basis independence of charpoly. /f B is a basis change matrix, 
charpoly(BM B™') = charpoly(M). 
Proof. When B is the matrix for a change of basis, notice that 
xl — BMB~'=xBIB7! — BMB~' = B(xI — M)B“!, 
since BIB~! = BB~'I = I. It follows that 
det(xI — BMB™') = det(B(xI — M)B7!) 
= det(B) det(xI — M) det(B™!) 


= det(B) det(B)~! det(x7 — M) 
= det(x/ — M). 
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That is, 


charpoly(BM B7') = charpoly(M). oO 


We make use of this basis independence in the next section, where we 
calculate the characteristic polynomial for the “multiply by w@” map in terms 
of the most convenient basis for a given algebraic number a. But first, let us 
pursue an obvious corollary of the invariance of charpoly: 


Corollary. All coefficients of charpoly(M) are basis-independent. 


One coefficient of interest is the constant term, which equals (—1)” det(M) 
if M is ann x n matrix. This can be seen by setting x = 0. Another is the 
coefficient of x”—!. To find this coefficient we let 


ait @j2 ++: Ain 
a21 422 **: a2n 
M= , so 

GQn1 4n2 *** Ann 

xX — a\\ —a12 eas —ain 
423 X—@y22 +++ Ay 

xI-M= 

—4dn1 —4n2 st X — Ann 


Then it is clear that all the x”—! terms in det(xJ — M) arise from the product 
of the diagonal terms, (x — aj1)(x — a22)--+ (x — Gyn), and hence 


coefficient of x”~! = —(ay, +22 + +++ +n). 


Definition and notation. The trace of ann x n matrix M with (i, j)-entry aj; 
is defined by 


Tr(M) = ay, + a2. + +++ + Aan. 


With this definition we can now write 
charpoly(M) = x” — Tr(M)x"—! +... + (—1)" det(M). 


Remark. One property of trace that follows immediately from its definition is 
Tr(M + M’) = Tr(M) + Tr(M’). 
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Exercises 
The basis-independence of trace can also be proved directly from its definition. 


1. Prove from the definition of trace that Tr(C D) = Tr(DC). 
2. Deduce from the previous exercise that Tr(A) = Tr(BAB ly, 


7.5 Trace and Norm of an Algebraic Number 


Given an algebraic number field E = Q(a), we now adapt the idea of Section 
6.4 to express any 6 € Q(q@) as a matrix: namely, the matrix B that expresses 
the linear map x + 6x (“multiplication by 6”) of Q(q@) relative to the basis 
elements l,a,...,a@”—!. 

The matrix 6 for multiplication by 6 of course depends on the choice of 
basis, but two important functions of a matrix are basis independent: its trace 
and its determinant, as we saw in the previous section. Because of this, the 
trace and determinant of the matrix for multiplication by 6 depend only on 6 
and the field E. 


Definitions and notation. If 6 ¢ E = Q(q), we define its trace in the exten- 
sion E of Q, Trz/Q(f), to be Tr(B), where B is the matrix for multiplication by 
B in E. Ne/Q(B), the norm of in the extension E, is defined to be det(B).! 


When it is clear which extension field E of Q we mean, we simply write 
Tr(B) and N(B). We know from the sections above that trace and det are basis 
independent, and also their main properties, such as: 


Tr(M + M’) =Tr(M)+Tr(M’), det(MM’) = det(M) det(M’), 
which imply 
Tr(B + 6’) = Tr(B) + Tr(B'), N(BB') = N(B)N(B'), for any B, p’ € E. 


The latter multiplicative property of norm follows from the multiplicative 
property of det and is a broad generalization of the multiplicative property of 
norm we observed in the special case of Gaussian integers back in Section 2.3. 
The multiplicative property is also related to the minimal polynomial for the 
number §, and its relation to the matrix B and its characteristic polynomial. 


! The subscript E'/Q does not denote a quotient structure, but is rather the algebraist’s pun for 
“E over Q.” Much as I deplore the pun, and its clash with the notation for quotient structures, 
it seems too widely used to resist. 
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Minimal and characteristic polynomials. /f p(x), of degree n, is the mini- 
mal polynomial of a, and « is the matrix for the map u +> au of Q(a) with 


n—-1 


respect to the basis |,a,...,a"~°, then the characteristic polynomial of « 


equals p(x). 


Proof. Suppose that p(x) = x”--a,_,x"~!+---+ag, where ag, ...,dn—1 € Q. 
Then, as we saw in Section 6.4, 


0 1 OO ::- 0 0 
0 0 1 0 0 
a= 
0 0 0 0 1 
—a0 —a\, —a2 —a4n-2 —4n-1 
It follows that 
x -l 0 0 0 
0 x -il 0 0 
xI-a=|i i: S|, 

0 O O :+ x -1 
ao a, a2 +++ Gn-2 X + 4n-1 


By expanding along the first column, repeatedly, we find that 


det(xI — a) =x" + a,x" !+---+aj= p(x). Oo 


We generalize this result to the minimal polynomial g(x) for an arbitrary 
B € Q(a) by following the idea in Section 6.6. In outline, the argument is this. 

Let B* be the matrix for the map v +> Bv in Q(B) with respect to the basis 
16:23 A an where m is the degree of g(x). Expand this basis to a basis of 
Q(a) by multiplying each B/ by a basis element y! for Q(a) over Q(B), where 
y exists by the theorem of the primitive element. Then the matrix for vr Bu 
in Q(a) becomes 


BO ws 6 

0 pt --. 0 
B=|. > 

0 0 --. B* 


where 0 denotes the m x m zero matrix, when the basis vectors are ordered 


WB isssah > We wasyh 7 wax 4 
ge gti B = Ds ae 
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It follows from the argument for a above that the characteristic polynomial of 
B* is the minimal polynomial of B, namely g(x). Then the shape of B shows 
that its characteristic polynomial is g(x)"/", which equals g(x)!#'F). 

Now q(x)!'Fl is the polynomial whose factorization we found in 
Section 6.6: 


q(x)Fl = (x — 04 (B)) +++ (x — on(B)). (*) 
This formula gives us formulas for Tr(6) and N(f): 


Trace and norm in terms of the o;. For any B € F, where F D> Qis a field 
extension of degree n, 


Tr(B) = 01(B) + +--+ on(B), N(B) = 01(B)-+- on (8), 


where the o; are the n embeddings F — C. 


Proof. We know from the previous section that the characteristic polynomial 
of ann x n matrix M is related to Tr(M) and det(M) = N(M) by 


charpoly(M) = x” — Tr(M)x"—! +--- + (—1)" det(M). 
Comparing this with the expression (*) for charpoly(B), we see that 


Tr(B) = 01(B) +--+ on(B), N(B) = 01(B)--- on(B). Oo 


Remarks. Since the matrix 6 for multiplication by 6 expresses a linear map 
of Q(a) (viewed as a vector space over Q), its entries are rational numbers. 
It follows that its trace and norm—that is, Tr($) and N(f$)—are rational 


numbers. Indeed, if 6 is an algebraic integer, then the entries —do, ...,—@n— 
in the matrix B are ordinary integers, and hence Tr(8) and N({) are ordinary 
integers. 


The formula N(6) = 0o1(8)---o,(B) confirms the agreement between the 
determinant concept of norm and the special norms used in Sections 1.5, 2.3, 
and 2.9. There we defined the norm of f to be the product of the conjugates 
of 8, and found that it was rational-valued and multiplicative. In those cases 
there were two conjugates, but in the case of degree n the conjugates of f are 
o1(B),.-.,On(B). 

This formula also shows the multiplicative property of norm — which we 
first saw as a consequence of the multiplicative property of det — in the light of 
the multiplicative property of the isomorphisms o;. Namely, 


N(BB") = 01 (BB) -- - on(BB’) = 01(B)o1(B') - +» on(B)on(B') = N(B)N(B'). 
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Exercises 
A field in which traces and norms are easily computed from conjugates is the 
cyclotomic field Q(¢,), where p is prime and 


2x |. Qn 
Cp = cos — +1 sin —. 
P 


A cyclotomic field we already know is Q(/-3), generated by (3 = 


ahe3 (which we sometimes called just ¢). We know that Q(¢3) is of degree 
2 over Q, and in fact Q(¢,) is of degree p — 1, since its minimal polynomial is 


xP Th xP hte $, o) 


the nontrivial factor of the polynomial x? — 1 obviously satisfied by ¢). We 
proved irreducibility of the polynomial (*) in the exercises to Section 4.7. 

Now we will look at some consequences of irreducibility for conjugates, 
trace, and norm in Q(¢)). 


1. Explain why ¢p, ae sia ce are the conjugates of ¢). 
2. Find the sum of these conjugates, and hence show that 


Tr(¢p) = Te(¢2) = --- = Tr(¢P*) = —1. 
3. Explain why Tr(1) = p — | and deduce that 
Trl — ¢p) = Tr(1—¢2) =--- = TH(1—¢2 ") =p. 
4. Also explain why 
N(1— fp) = (1 — bp)(1— 62) ++ (1— gf’). 


Now we can compute N(1 — ¢,) in a different way as the constant term of 
its minimal polynomial. 


5. Explain why the minimal polynomial for x = 1 — ¢p is 
(L—x)Ph+(—x)P?+--+(L-x) +1 
and deduce that p = N(1 — fp) = (1—-¢p)(1- £7) _ (1 _ oe , 


In the exercises to Section 8.3 these calculations will give the integers of Q(¢p). 


7.6 Discriminant 


The theorem of the primitive element tells us that we are lucky in the theory of 
algebraic number fields: each of them is a vector space over Q generated by a 
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single element @ and with basis 1,a, ...,a”~!. For rings of algebraic integers, 
coming in the next chapter, the situation is not so rosy. Even for the integers in 
algebraic number fields of degree n, there is not generally a single generator, 
nor a convenient basis. To prepare ourselves to face these difficulties, we make 
a study of bases in general and, in particular, look for a way of testing whether 
a given n-tuple of members of a field F of degree n is a basis of F. 

The tool for answering this question is the discriminant. 


Definition. The discriminant A(@,...,@,) of @1,...,@, € F, where 
F D> Qis an extension of degree n, is defined to be det(M y, where 


o1(@1) o1(@2) +++ o1(@n) 
7 02(@1) 02(@2) +++ 02(@n) 
on (a1) On (w2) ‘t+ On (@n) 


and the o;: F — C are the n embeddings of F. 


We will find that det(M) “discriminates” between bases and nonbases 
accordingly as it is nonzero or zero. The reason for squaring is that det(/) 
can change sign for a trivial reason — exchanging two of the o; — so we want to 
be able to ignore its sign. 

Our first result gives an alternative formula for A(@1, ...,@,), as det of the 
trace matrix T whose (i, j)-entry is T;; = Trr/g(@jw;). As usual, we will 
write simply Tr for Trf/g. 


Discriminant in terms of trace. [f we set T;; = Tr(wj@j;) and let T be the 
matrix with (i, j)-entry T;;, then 


A(@1,..-,@n) = det(T). 


Proof. By definition, A(@1,...,@n,) = det(M y, where M is the matrix with 
(i, j)-entry o;(@;). From Section 7.2 we know that the same determinant 
belongs to the transpose M'‘, whose (i, j))-entry is oj (w;). Therefore, by the 
multiplicative property of det, det(M)* is the determinant of M'M, whose 
(i, j)-entry is 


01 (@;)0|(@j) +++» + On(@)On(@;) 


= 01(@jWj) +--+ + On(@j@;) because 0], ...,0, are isomorphisms 
= Tr(@jo;) by the formula for trace in the previous section 
= Tj; oO 
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An example where we can explicitly calculate the trace is in the field Q(q@), 
where @ is an algebraic number of degree n. In this case we can use the standard 


basis l,a,...,a@”—!, so oj = a/—!. Then if we let o;(a) = a;, the definition 
of discriminant says A(1,a, ...,a@”~!) is the square of 
2 n—| 
1 ay ay sss a 
1 @ as oe a 
det 
2 -1 
1 a af +++ an 


This is a classical determinant that goes back to the 18th century, called the 
Vandermonde determinant.” It can be evaluated with the help of the follow- 
ing remarks. We first suppose that aj,...,@, are simply distinct “variables,” 
or “indeterminates.” 


1. By summing the degrees of the columns,0+1+2+---+(a—-lD= 
n(n — 1)/2, we see that the determinant is a polynomial in the variables 
Q1,..-,Q@n, each term of which has total degree n(n — 1)/2. 

2. The determinant becomes zero if any a; = a; fori # j, because two rows 
become equal. So the determinant has a factor (@; — a;) for each pair i, j 
with i < j, by the corollary in Section 1.8 (the factor theorem for 
multivariable polynomials). 

3. There are n(n — 1)/2 such pairs. Namely, they are half of the pairs with 
i # j, for which we can choose i in n ways and then j inn — 1 ways. 

4. Therefore, the Vandermonde determinant equals some constant times the 


product Hie; (aj; — aj) 


By looking at a specific term in the determinant — say, the product of the 
diagonal elements — we can show that the constant equals 1. What is more 
important is that we are interpreting a1, ...,n as the conjugates of a, so they 
are all different. Therefore, 


Tie; @i — aj), and hence A(1,a,... sa), is nonzero. 


We now leverage this discovery into a proof that the discriminant of any 
basis is nonzero, with the help of the following theorem. 


Discriminant of a transformed basis. [f a, ...,@, is a basis for a field E of 
degree n over Q, and w), ...,@), € E, then 


A(w},.--,@),) = det(A)?A(@, ...,@n), 


where A is the matrix of the linear map sending the a; to the w’.. 


2 Although Vandermonde wrote a paper about determinants in 1771, its connection with this 
particular determinant is not straightforward. See Ycart (2013) for the whole story. 
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Proof. Since 1, ...,@n is a basis, we can express the w; by linear equations 

WO, = ayia, + +++ + ani@n. (*) 
Let A be the matrix with (i, j)-entry a;;. Now A(o}, ...,@,) is the square of 
the det of 

o1(@;) o1(@) +++ o1(a),) 

7 02(@) 02(@5) +++ o2(a),) 

on (o) on (w) “7+ On (@,) 

and 


OK(@,) = ayjoK(@1) ++ ++ + Ani OK (On) 


by (*), since og is an isomorphism. 

It follows that M’ = AM, where M is the matrix whose squared det equals 
A(q@1,...,@n). Taking det of both sides of M’ = BM, and squaring, therefore 
gives 


A(w},..-,@)) = det(A)’A(@1, ...,@n) Oo 
Corollary. For any @,...,@n € E, a field of degree n over Q, 
@1,+--,;@n is a basis of E > A(aq,...,@n) # 0. 


Proof. The theorem of the primitive element gives a basis of E of the form 
la,...,a@”—!, for which A # 0. The matrix A of the map sending this basis 
to any other basis is invertible, and hence det(A) # 0. Then the theorem above 
gives A # 0 for the new basis. 

Conversely, if A maps a basis to a nonbasis, then det(A) = 0, so A of the 
nonbasis is zero. Oo 


Remark. The formula for the discriminant of a transformed basis is a big 
generalization of the discovery of Lagrange, mentioned in Section 3.3, about 
the discriminants of quadratic forms. The maps considered by Lagrange have 
determinant +1, which is why they leave the discriminant unchanged. 


Exercises 


1. Show from the formula for A that A(1,i) = —4, which agrees with 
Lagrange’s discriminant D for the form x? + y* = (x + iy)(x — iy). 

2. Likewise, show that A (1, /—2) = —8, which agrees with Lagrange’s 
discriminant D for the form x? + 2y? = (x + f—2y)(x _ /—2y). 
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3. Show that A(1,¢) = —3, which agrees with Lagrange’s discriminant D 
for the form x? + xy + y? = (x — Cy)(x — Cy), where € = ena 


For a more general result that encompasses the three above, consider 
a quadratic form ax? +bxy+cy* for which Lagrange’s discriminant 
D = b? — 4ac is not a square. Then ax” + bxy + cy” does not have a rational 
factorization, and the numbers 


—-b+ JD —b—J/D 
——— and — _—— 
2a 2a 


are conjugate members of the quadratic field Q(VD). 


4. Show that 


1s ~P.) ( =?) 
y) (x y 


2 2 
b — 
ax” + bxy + cy ( a a 


5. Explain why 1, zara is a basis for the quadratic field Q(VD), and show 


that 
—b D 
A (14 v =p 


—b+J/D 


The reason for choosing 7 as a basis element, rather than aed iS 


that nD is an algebraic integer, as defined in Section 4.6. 


6. Check that eee is the root of a monic polynomial equation with 
ordinary integer coefficients. 


Thus we have actually found an integral basis for the field, an idea that will 
be important in the next chapter — see Section 8.3. 


7.7 Discussion 


The theory of norm and discriminant in this chapter is essentially the same as 
that given by Dedekind in the 1870s; see (Dedekind, 1996, pp. 112-116). He 
derives the multiplicative property of the norm directly from the multiplicative 
property of the isomorphisms o; but uses determinant theory to prove the 
discriminant criterion for a basis. Dedekind shows that the discriminant of any 
basis is nonzero via the Vandermonde determinant, whose value is nonzero 
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“by virtue of a well known proposition in the theory of determinants” (1996, 
p. 112). Thus, Dedekind assumed that his readers had a strong a background 
in determinant theory. 


7.7.1 History of Determinant Theory 


As mentioned in Section 6.7, determinant theory was once the main topic in 
linear algebra. It attracted the attention of leading mathematicians and was 
one of the main branches of algebra in the 19th century. For this reason, 
it became the subject of one the most monumental histories of any branch 
of mathematics, the four volumes of Sir Thomas Muir’s The Theory of 
Determinants in the Historical Order of Development, written between 1890 
and the 1920s. Muir’s book was reprinted as Muir (1960), but in recent decades 
it has dropped out of sight, no doubt because linear algebra is now universal 
and determinant theory is (relatively) marginal. But still, it is important, as we 
have tried to show in this chapter, so let us review some of its history. 

A noticable feature of the history of determinants, as Muir seldom failed to 
point out, is that its main results have often been discovered and rediscovered 
independently. Determinants themselves were discovered independently by 
Seki in Japan in 1683 and Leibniz in Europe at approximately the same time. 
Seki’s early results were for small determinants (up to 5 x 5), but by 1710 
he had found an inductive definition of n x n determinants via cofactors— 
essentially the one used in Section 7.2. This is also known as the Laplace 
expansion because, yes, it was rediscovered by Laplace (1772). Leibniz’s 
work on determinants, which was never published, has been little known until 
recently. The eminent Leibniz scholar Eberhard Knobloch has concluded, in 
Knobloch (2013), that Leibniz found all the basic results, including the Laplace 
expansion, the sign of permutations, and Cramer’s rule, by 1684. 

As mentioned in Section 6.7, Cramer first published the determinant rule 
for solving systems of linear equations. The rule appeared in the widely read 
book by Cramer (1750) about algebraic curves, so by 1750 the European 
mathematical community was finally aware of the determinant concept. Its 
basic properties were elucidated over the next 50-60 years: 


e Vandermonde (1771) sketched the foundations of the theory. 

e Laplace (1772) (re)discovered the cofactor expansion. 

e Rothe (1800) defined even and odd permutations. 

e In 1801 Gauss introduced the term “determinant” and discovered the 
multiplicative property. 
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e In 1812 Binet and Cauchy independently discovered the multiplicative 
property, and Cauchy gave a satisfactory proof. 


Summing up these developments from the viewpoint of the 1890s, Muir (1960) 
wrote in his volume | (p. 130): 


It is not too much to say, though it may come to many as a surprise, that the 
ordinary textbooks of determinants supplied to university students of the present 
day do not contain much more of the general theory than is to be found in Cauchy’s 
memoir of about eighty years ago. ..._ It is no doubt impossible to call him, as 
some have done, the formal founder of the theory. This honour is certainly due to 
Vandermonde, who, however, erected on the foundation comparatively little of the 
superstructure. ..._ Cauchy relaid the foundation, rebuilt the whole, and initiated 
new enlargements; the result being an edifice which the architects of today may 
still admire and find worthy of study. 


The modern theory begins with Grassmann (1844) and his sophisticated 
concept of outer product. Muir actually notes Grassmann’s contribution, but 
he does not recognize the rising tide of the modern linear algebra that was 
eventually to submerge his beloved theory of determinants. 

It is therefore interesting that the decidedly modern Artin (1942) took the 
trouble to present an elementary treatment of determinants, which was not 
really needed for the rest of his book. Perhaps he felt that determinants were 
being neglected, or handled inelegantly, and he wanted to take the opportunity 
to educate his readers. 
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Preview 


We now consider the effect of replacing the field F in the definition of vector 
space by a ring R. The resulting structures are called R-modules, and they 
have exactly the same definition as F-vector spaces, except that the “scalars” 
come from the ring R instead of from the field F’. And just as the structure of 
algebraic number fields can be clarified by viewing them as Q-vector spaces, 
the structure of algebraic integer rings can be clarified by viewing them as 
Z-modules. 

However, this idea meets difficulties because modules do not always have 
the standard vector space features, such as basis and dimension. In general, 
R-modules do not have bases, but there is an important class of those that 
do: the so-called free R-modules. For these, we can describe linear maps 
by matrices, as we do for vector spaces, and prove that there is an invariant 
dimension, at least for free modules with a finite basis. 

However, finding bases for modules requires deeper linear algebra than it 
does for vector spaces. There is no “primitive element theorem” saying that 
an algebraic number ring is Z[a@] for some algebraic integer a. And proving 
that submodules of a free R-module M have dimension no greater than that of 
M requires a restriction on R: that it be a principal ideal domain. Fortunately, 
this applies to R = Z, so we escape the difficulties in the case that originally 
motivated the theory of modules. 

In particular, the algebraic integer rings Ze are finitely generated 
Z-modules, and we can find bases for them with the help of the discriminant. 

With these results, we can revisit rings and ideals from the viewpoint of 
modules, which is a convenient generalization of both. 


171 
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8.1 From Vector Spaces to Modules 


As mentioned in Section 6.3, a module is the analogue of a vector space where 
the field F is replaced by aring R. 


Definition. An R-module, is a set M of objects u,v, w, ... which can be added 
and multiplied by members of R. 
Addition in an R-module has the properties of an Abelian group, namely 


ut+t+v=v0-+4, 
u+(v+w)=(ut+v)4+ wv, 
u+0=un, 
u+(—u)=0. 


Multiplication, by any member of R, has the properties 


a(bu) = (ab)u, 
lu =u, 
a(u+v)=au+av, 


(a+b)ju=au-+ bu. 


Thus, calculations in a module, like those in a vector space, resemble 
calculations with numbers, except that multiplication is restricted to members 
of R and there is not generally any division. A vector space is the special 
case of an R-module where R is a field, but there are many other important 
examples. Here are some. 


1. Any ring R is itself an R-module. It is generated by the element 1, in the 
sense that R = {rl: r € R}. If R is a subring of a ring S, then S is an 
R-module, generated by | and certain elements of S that are not in R. 

2. Any Abelian group G is a Z-module. Writing the group operation as + 
(as in Section 3.7), if g ¢ Gandn € Zwe letng = g+---+ g (withn 
summands) when n > 0, and the obvious things when n < 0. 

3. The ring Z[V2] = {a+ b¥V2: a,b € Z} is a Z-module, generated by the 
two elements 1 and /2. 

4. Similarly, the ring Z|/—5] is a Z-module, generated by the two elements 
1 and ./—5. More interestingly, the ideal 


[2m + (1+ V-5)n: m,n eZ} 


in Z[/ —5| is a Z-module, generated by the elements 2 and | + /—5. 
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5. Any ideal J C ring R is an R-module C R (an R-submodule). Conversely, 
any R-submodule of R is an ideal because the sum of two elements of J is 
in J, and the product of an element of J by an element of R is in J. 


Because of the last two examples, modules have an important role to play in 
the study of ideals. One of our aims will be to prove that the integers of an 
algebraic number field constitute a finitely generated Z-module, and that the 
same is true of their ideals. 

One of the properties of ideals that extends to modules is the ability to form 
quotients. 


Definitions and Notation. If M’ is an R-submodule of an R-module M, then 
a,b € M are said to be congruent mod M’, written as 


a=bmodM’, 


if a — b € M’. The class [a] = {b : a = b mod M’} is called the congruence 
class of a mod M’, and the congruence classes mod M’ constitute the quotient 
module M/M’. 


Of course, to justify these definitions, we have to show that congruence mod 
M’ is an equivalence relation, and check that the congruence classes form an 
R-module when addition and scalar multiplication are defined by 


[a]+[b] =[a+b], cla]= [ca], foranyce R. 


These facts can be verified by much the same arguments that we used for the 
special case of congruence mod n in Z (Section 1.6). In fact, it seems that 
modules got their name (from Dedekind) because they admit a relation of 
congruence modulo M. 

To some extent, we can study R-modules by methods of linear algebra. 
But certain convenient properties of vector spaces, such as the existence of 
bases, do not hold for all modules. Because of that, we often study R-modules 
for restricted classes of rings R, such as the principal ideal domains (PIDs) 
mentioned in Section 5.2. Since Z is a PID, we can exploit this property in 
rings that are Z-modules, even when they are not PIDs themselves. 

The definition of linear map for R-modules is exactly the same as that for 
vector spaces, but with “scalars” from the ring R instead of the field F. We 
may also call it an R-linear map and represent it by a matrix with entries 
in R. However, this is possible only if the R-module has a basis, which we 
investigate in the next section when R = Z. 
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Exercises 


Consider the Z-module Z| /—5] and the Z-submodules (which are ideals) 
introduced in Section 5.1 and its exercises: 


I= {2m+ (1+ V—3)n:mn ez}, 
J = [3m + (1+ J=3)n: mn eZ} 


We saw in the exercises to Section 5.3 that J and J have different quotients 
Z[./—5 /I = Fy and Z[/—5 /J =F. Nevertheless, J and J are geometri- 
cally similar — they have the same shape. 


1. By viewing the pictures of J and J in Section 5.1, show that, among 
several elements of equal size, 


e 2and | — /—5S (in that order) are the smallest two members of /, 
e 1+ \—5 and 3 (in that order) are the smallest two members of J. 


2. Show that 2 = +x. 

3. Deduce from this equation that both J and J may be viewed as the vertex 
sets of parallelograms of the same shape; that is, with the same angles and 
the same ratio (length of longer side)/(length of shorter side). 

4. Show in fact that J may be obtained from J by magnifying by 3/2 and 
rotating by a certain angle. 

5. Explain why the principal ideals of Z| /—5] all have the same shape, 
which is different from the shape of J and J. 


These two “shapes” of ideals correspond to what are called ideal classes, which 
happen to be members of the class group briefly mentioned in Section 3.4. 


8.2 Algebraic Number Fields and Their Integers 


As we saw in Section 4.6, the algebraic integers in an algebraic number field E 
form a ring because their sums and products are integers, also in E. We denote 
this ring by Zg and view it as a Z-module by interpreting na, for n € Z and 
a € Zr, as the |n|-fold sum 


a+---+aifn>0, (-a)+---+(-a)ifn <0, and Oifn=0. 


The members of E are “fractions” formed from the integers of E, in fact 
they are fractions whose denominators are ordinary integers. 
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Algebraic numbers as fractions. [f the field E is an extension of Q of finite 
degree, then each B € E is of the form a/b, where a € Ze and b € Z. 


Proof. If E is an extension of degree m, then each 6 ¢€ E satisfies an equation 
of the form 


1 


Amx”” + am—1x""— ++---+ag = 0, where ag, ...,dm € Z. 


m—1 
m 


Multiplying through by a?'—*, we obtain the equation 


(amx)" oF Giana spehese aga! = 0, 


also satisfied by x = £. But the latter equation is a monic equation in a,x, So 
AmB is an algebraic integer a € Zz. 

Thus, 6 = a/d,» is the quotient of a € Zz by an ordinary integer a, € Z. 

im 


A simple instance of the theorem is the extension field Q(V2) of Q, whose 
integers are the numbers p-+q V2 where p,q € Z. We have seen that a general 
element of Q(v2) is of the form a + b V2, where a,b € Q. Anda+b V2 is 
of the form (p +4q V2) /r, where p,g,r € Zand r is acommon denominator 
of a and b. 

Now for some corollaries to the representation of algebraic numbers as 
fractions. 


Corollary 1. E = Frac(Zz). 


Proof. Because Frac(Z,) includes all fractions of the form a/b, with a € Zz 
and b € Z C Zr, and these fractions make up all of E. oO 


Corollary 2. If E has degree m over Q, then E has a basis of m elements of 
Zr as a vector space over Q. 


Proof. We know from Section 4.5 that E of degree m is a vector space of 
dimension m over Q, so it has a basis consisting of m elements 6; € E. By the 
theorem above, each 8; = a;/b;, where a; € Zr and b; € Z. 

If we multiply the basis elements 6; by the least common multiple b of the 
b; we obtain elements w; = bf; which belong to Zz and which obviously still 
form a basis for E as a vector space over Q. oO 


The second corollary means that each 6 € E can be written in the form 
B=rjo, +---+rm@m, where r,,...,%m € Qand @1,...,@m € Ze. 


If in fact r1,...,7m € Z, then B € Zr, by the ring properties of Zz, but the 
converse does not necessarily hold. If it does, the w; form what is called an 
integral basis for Zz, which is something much to be desired. 
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Exercises 


We saw, in exercise | of Section 4.6, that the ring of integers of Q is Z. In the 
case of quadratic fields Q(vd) for an ordinary integer d, it is quite easy to 
describe the integers of the field: 


e Ford # 1 (mod 4), they are the a + b Jd with a,b € Z. 
e Ford = | (mod 4), they are the a + b Vd with 2a, 2b € Z. 


1. Show first that we can assume d squarefree (not divisible by a square 


greater than 1), because if d = mn, then Q(Vd) =Q (/7). 
2. Show that the equation satisfied by x = a+b Jd, for a,b € Q, is 


x? + Ax + B =0, where A = —2a and B = a” — db’. 


Thus, for a + b Jd to be an integer of Q(Vd) we need A,B e€ Z. The 
condition A = —2a € Z implies either a € Z or 2a € Z, with 2a odd. Let us 
see where both these possibilities lead. 


3. Assuming a € Z and a* — db? € Z, show that db? € Z and hence b? € Z 
because d is squarefree. 

4. Conclude from exercise 3 that if a € Z, then b € Z. 

5. Assuming 2a € Z with 2a odd, and a? — db* € Z, show 


d(2b)? = (2a)* =1 (mod 4). 


6. Deduce from exercise 5 that d = 1 (mod 4) and 2b € Z with 2b odd. 


8.3 Integral Bases 


Definition. If E > Qis a field extension of finite degree, then @1,...,@m € 
Ze are said to be an integral basis for Zr if each 6 € Ze is uniquely 
expressible in the form 


B= bia, +++ + Dm@m With by, ...,Bm € Z. 


We now show that Zz always has an integral basis, with the help of the 
discriminant from Section 7.6. 


Existence of an integral basis. [f E > Qis a field extension of degree m, then 
Ze has an integral basis of m elements. 


Proof. We have just seen, in Corollary 2, that F has a basis over Q consisting 
of w1,...,@m € Ze. For such a basis, A(@1,...,@m) € Z because A is the 
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determinant of the matrix T with entries Tr(@;@;) in Z, by the formula for 
discriminant in terms of trace in Section 7.6 and the remark near the end of 
Section 7.5. Also, A(@1, ...,@m) iS positive by its definition in Section 7.6. 

This means we can choose a basis @1,...,@m Of members of Zp whose 
discriminant is minimal. We now show that such a basis is an integral basis for 
Ze. If not, there is some w € Zr whose (unique) expression 


@O=ryoa,+---+rn@m, Withr1,...,%m €Q 


has some r; ¢ Z. By reordering the basis elements, if necessary, we can assume 
that r) ¢ Z. 

Then, if a; is the integer nearest to r;, we have |r; — ai| < 1/2. If we now 
replace the basis element @, by 


O) = O— a, = (Fr — a1)@) + 12@2 +++ +1 mOm, 


then w € Zz by the ring properties of Zz, and we still have a basis of E 


because 
1 i 
oO| = (| — r2@2 — +++ rm@m) and r| —a, #0. 
ry — a 

The matrix for the change of basis from @1,@2, ...,@m to ow, @2,...,@m 1S 
Tr) —aQ\ 12 13 +++ Tm 
0 1 O .:-. O 
A= 0 0 1 0 
0 0 O .:--. I 


So, by the formula for the discriminant of a transformed basis in Section 7.6, 


A(w},@2, -..,@m) = det(A)*A(@1,@2...,@m) 
= (ry — a1)*A(@1, @2, -..,@m) 
= A(@1,@2, ee ,Om) 


because |r} — a1| < 1/2. This contradicts the minimality of A(@),@2...,@m), 
SO @1,@2, ...,@m is indeed an integral basis. oO 


Corollary. Any integral basis of Ze has m members. 


Proof. Any integral basis of Ze is also a vector space basis for E over Q, 
because each 6 € E equals a/b where a € Zz and b € Z. Hence, the basis 
has m members by the invariance of vector space dimension. Oo 
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With this theorem and its corollary, we have clawed back the advantages 
that vector spaces usually have over modules — basis and dimension — albeit 
for a limited class of modules. What we have shown is that the ring Zr of 
integers in an algebraic number field of degree m is a free Z-module of rank 
m, in terminology to be defined in the next section. Of course, the rings Zr 
are precisely the rings we are interested in, so we could be satisfied with this 
special case. But it is worth saying a little more about the general concepts of 
free R-modules and their rank. 


Exercises 


Following the determination of the integers of Q(Vd) for squarefree d € Z, in 
the previous exercise set, we can now find an integral basis for these integers. 


1. Ifd # 1 (mod 4), show that 1, Vd is an integral basis for the integers of 
Q(vd). 
2. Ifd = 1 (mod 4), show that 1, clea is an integral basis for the integers 
of Q(Vd). 
These results confirm that the integers we previously found in the fields 
Q(/—3) and Q(/—-5) — namely the numbers a + ps anda + b./—5 
for a,b in Z — are in fact all the integers in these fields. 


Another interesting example is the cyclotomic field Q(¢,), where p is 
prime and ¢, is the following solution of x? — 1 = 0: 


2x. . on 
Cp = cos — +1 sin —. 
Pp Pp 
We calculated some norms and traces for this field in the exercises to Section 
7.5, in particular 


-1 
Tr(1 — fp) = Tr(1 — 67) =--- = Tr(l- gp) =p, ) 
-1 
N(1— bp) = (1-%)(1- £7) ---(1-gp +) =. * 
First, we show that (1 —¢p)Za¢,) 1Z = pZ; that is, the multiples of 1—¢), 
by integers of Q(¢,) that are in Z are the multiples of p. 


3. Deduce from (**) that p € (1 — Sp)IZQ(Ep)> and hence that 
(1 — Sp) Ze¢¢,) OZ 1s an ideal of Z containing pZ. 

4. If 1 — €p)Zqc,) 1Z # pZ, deduce that 1 € (1 — fp)Zaqc¢,), so 1 — fp 
(and all its conjugates) divide 1 in Zqiz,). 

5. Deduce from (**) and the hypothesis in exercise 4 that 1/p is in Zq¢,), 
and hence in Z, which is a contradiction. 
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Next we show that, for any x € ZQ(tp)> Tr(x(1 — fp)) € pZ. 


6. By factorizing | — ¢ is show that all the conjugates of 1 — ¢, are multiples 
of 1 — ¢,, and hence that the ordinary integer Tr(x(1 — ¢,)) is in pZ. 


Finally, by writing x = 49 +4fp+++°+ Gp at —acombination of the 
basis elements 1,fp,...,¢ 7 with rational coefficients — and taking the trace 
of 1 — ¢, times both sides of this equation, we can show that ao,qa1, ... in turn 
are in Z, so that integers of Q(¢,) are in Z[¢p]. 


7. Applying Tr to 
~2 -1 
x(1 — fp) = an(1 — fp) +ar(Sp — 65) +--+ ap-2(th  — oh); 
and observing that Tr(¢i, —¢ ae = 0 by (*) (can you see why?), show 
that agp € pZ and hence dp € Z. 
8. Then, since a = ee € Zac¢,), we have 


= —3 
(x on ao)op : =d)| - a2bp Se oe a Ap-2bp € ZQ(¢p)- 


By suitable repetition of the argument in the previous question, show that 
a,a2,... € Z, so that x € Zp]. 


8.4 Bases and Free Modules 


As we saw in Section 6.1, a key feature of an F-vector space is the existence 
of a basis, which allows each v € V to be expressed uniquely as a linear 
combination of basis elements with coefficients in the field F’. In an R-module 
M it may be possible to find generators for M, in the sense that every member 
m € M isa linear combination of generators with coefficients in R. But even 
if the generators are independent, in the sense that no one of them is a linear 
combination of the others, the expression for m € M is not necessarily unique. 

Examples are the finite Abelian groups, mentioned in Section 8.1 as 
examples of Z-modules. Specifically, and using the additive notation from 
Section 3.7, consider the Z-module 


Zo ® Z3 = {ai+bj:a,beZ} 
where i = congruence class of 1, mod 2, 


Jj = congruence class of 1, mod 3. 


These generators i and j are subject to the relations 2i = O and 3j = 0. 
Although i and j are independent, their coefficients a,b are not unique, since 
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(a+ 2)i = ai and (b+ 3)i = bi. Indeed, there are only six different elements 
ai + bj, which is why Zz @ Zs is a finite Abelian group. 

If an M-module has generators mj; not subject to any relations other than 
those that hold in any module, such as m; + mj; = mj; + mj, then M is called a 
free R-module. Thus, the Z-module Z2 © Z3 is not free because its generators 
i and j are subject to relations and, equivalently, their coefficients a,b € Z are 
not unique. 


Definitions. An R-module M is called free if there are generators m; of M 
such that each m € M isa linear combination of (finitely many) m; with unique 
coefficients a; € R. A generating set with this property is called a basis of M, 
and its size is called the rank of M over R. 


Examples of free R-modules are of course the F-vector spaces, but also the 
direct sums Z @ Z, Z ® Z @ Z, and so on, which are free Z-modules, and the 
ring Zz for each field extension E > Q of degree m, which is a free Z-module 
of rank m, as we saw in the previous section. More generally, for any ring R, we 
define R™ to be the set of ordered n-tuples (a1,...,@n), for d1,...,dn € R, 
under the addition operation defined by 


(a1, ...,4n) + (b1,...,bn) = (4 +41, ..-,An + bn), 
and the scalar multiplication operation defined, for any c € R, by 
C(a1,...-,dn) = (Ca, ..., Cdn). 


Then R™ is a free R-module with n basis elements of the form 
(0,...,1,...,0) (one place equal to | and the rest 0). Conversely, any free 
R-module M with a basis m1, ...,m™y is isomorphic to R™), because each 
m € M corresponds to the unique n-tuple (aj,...,dy) € R™ such that 
m=aym, +:::+dymMyn. 

Free R-modules share with vector spaces a notion of dimension, called rank 
in the definition above, because of a theorem: /f R“ is isomorphic to R™, 
then m = n. We proved this theorem in the special case where R = Z in the 
previous section. The case of general R requires a rather specialized theorem 
about determinants, and we do not need it, so we omit the proof. 


8.4.1 Free Z-modules 


Another property that free R-modules share with vector spaces is a “subspace” 
property like that found in Section 6.2. However, we now have to restrict R 
to be a principal ideal domain, mentioned in Section 5.2. In fact, we need 
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the result only for the domain Z, so we will write the proof in terms on Z. 
(However, the proof is identical with any PID in place of Z.) 


Submodules of a free Z-module. Any Z-submodule M of Z™ is a free 
Z-module of rank m <n. 


Proof. The result is trivial for n = 0 because Z© = {0}. Now take n > 0, let 
U},U7, ...,U, bea basis of Z™, and suppose the theorem is true for n—1. Then 
it holds in particular for Z~)), the submodule of Z generated by uo, ... ,Un. 
So the theorem holds for a submodule MC Z"—)), 

Assume now that M ¢ Z~!). The part MN Z~" is a submodule of 
Z"—D) so it has a basis v2, ...,Vm with m—1 <n —1 elements. We now seek 
an element v; outside MM Z~!) that completes a basis for M. 

Consider the set J C Z of “uw; components” of members of M: 


T={xeD:xu,+y €M forsome y € Z")}. 


Then / is a nonzero ideal of Z and hence, since Z is a PID, J is generated by 
some nonzero d € Z. That is, J = (d). This means there is an element 


vj =du;+y, eM forsome y; € ze"), (*) 
To show that v1, v2,..., UV 1S a basis for M, we show: 
1. v1, 02,...,Um generate M. Well, 
zEM>z=xu,+y for some x € J andy € Z"—), 
=>z=ddujt+y for some d; € Z, since I = (d). 
=> z—dv, = (djdu; + y) — di (du, + y1) by definition (*). 


=>z-djvj=y-—dyi 
=>z-dye MnZ"-) since yy € Z@-) and z— dv, € M. 


m 
>z-dv= So djv; Since V2,...,Vm generate MM Za), 
j=2 


m 
Sc= > div;, SO V1, 02,...,Um generate M. 
j=l 


2. V1,02,...,Vm are independent. 
Suppose that 0 = )1"'_, djv;, which gives, by (*), 


m 
0 = dyduy + diy, + ¥) djv;. 
j=2 
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Since y},02,...,0m € Z"~ and u, ¢ Z—", it follows that djd = 0, 
hence d; = 0. But this leaves 


m 
0= Sd, 
j=2 


And v9, ...,Um is a basis, for MN Z"—), s0 dy =--- =dy =Oalso. a 


Now it is time to reiterate that the free Z-modules we have in mind are 
the algebraic integer rings Zz in field extensions E of finite degree over Q. We 
have been interested in the ideals of such number rings since Section 5.2, where 
we found an ideal in the ring Z| /—5] with the two generators 2 and 1+ ./—S. 
The theorem above now tells us that any ideal of an algebraic integer ring Ze 
is finitely generated, because an ideal of Zr is a Z-submodule. In particular, 
this tells us that each ring Zz is Noetherian, as defined in Section 5.4. 


Exercises 


Any R-module M can be viewed as the result of imposing relations on a free 
R-module R’), as we did with the example of Zz ® Z3 above. The idea of 
“imposing relations” on R“” can be viewed as forming M as the quotient of 
R“” by the R-submodule whose elements equal 0 in M. Here is how we arrive 
at Z2 © Z3 from the free Z-module 


Z® = {ai + bj: a,b € Z}. 


1. Let G = {2ai + 3bj: a,b € Z}. Show that G is the free Z-submodule of 
Z°) generated by 2i and 3/. 

2. Show that Z@ /G is a Z-module in which 2[i] = 3[j] = 0, where [i] and 
[j] are the congruence classes, mod G, of i and j’, respectively. 

3. Deduce that Z® /G = Z ® Zs. 


A nonzero ideal of Zz is not just finitely generated, but of the same rank. 


4. Let w|, ...,@m be an integral basis of Zz, as found in the previous section. 
If J is a nonzero ideal of Zz anda ¢€ I, explain why aa, ...,a@, € I. 
5. Conclude that J has the same rank, m, as Zp. 


8.5 Integers over a Ring 


The Noetherian property is one characteristic of algebraic integer rings likely 
to be useful in studying ideals. Another is the property of “integer” itself, which 
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we now generalize to an arbitrary ring R. The obvious way to generalize the 
concept of integer is to take the algebraic integer concept, from Section 4.6, 
and replace the (ordinary) integers in its definition by elements of a ring R. 


Definition. If R is a ring and a belongs to some extension of R, then a is 
called integral over R if a satisfies an equation of the monic polynomial form 


x" + ay_yx" 1 4+---+ a,x +a9=0, where ap,aj,...,dn_1 € R. 


Thus, what we called the algebraic integers in Section 4.6 are the integers 
over Z. As with algebraic integers, it is usually convenient to confine integers 
over R to a field of finite degree over R. To extend R to a field at all, R must 
be a domain, in which case R embeds in its field of fractions Frac(R). 


Definition and notation. If R is a domain and E D Frac(R) is an extension 
field of finite degree, then integers of E over R are the a € E that are integral 
over R. 


When it is clear which R the integers of E are over, we will simply call them 
“integers of E.” These integers form a ring, as the following theorem shows. 


Ring properties of the integers of an extension field. [f R is a domain and 
E > Frac(R) is an extension field of finite degree, then the integers of E form 
a ring. 


Proof. As in the special case where R = Z (Section 4.6), it suffices to prove 
that a + 6, a — B and af are integers of E when a and £ are. (Because then 
the ring properties of the integers of E are inherited from the ring properties 
of E.) 

The proof that a + 6, a — B and af are integers is also much the same as 
when R = Z. This is because the required properties of determinants hold with 
an arbitrary domain in place of Z, as we remarked in Section 7.3. Oo 


It is interesting and useful to know that all elements of E are actually 
fractions formed from its integers. In fact they are fractions of the special form 
(integer of E)/(member of R). The proof is exactly the same as in the special 
case R = Z, which we proved in Section 8.2, with R in place of Z. So, even 
in this general setting, elements of a field are “fractions” whose numerator and 
denominator are “integers” (since elements of R are trivially integers over R). 


Exercises 


An important application of the integer concept outside domains of numbers is 
where R = C[x], the ring of polynomials with complex coefficients. 
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1. Explain why C[x] is a domain, so the field F = Frac(C[x]) exists. 


The elements of F are called rational functions. (Indeed, in some of the 
literature, polynomials are called “integral rational functions.”) An extension 
field E D> F of finite degree over F is called an algebraic function field. 


2. Deduce that an algebraic function f satisfies an equation of the form 
arf’ +---+a,f +ay=0, where ao,...,ax € C[x]. 


3. Explain how each algebraic function f may be realized as a congruence 
class of polynomials in f, illustrating the construction with f(x) = ./x. 


As one would expect, the functions satisfying monic polynomial equations 
with coefficients in C[x] are called integral algebraic functions. 


4. Show that f(x) = 1+2./x +x is an integral algebraic function of degree 
2 over C[x]. 

5. Express g(x) = = _ as the quotient of an integral algebraic function by 
a polynomial, and explain why this is possible for any algebraic function. 


8.6 Integral Closure 


In the previous section we established a convenient setting for studying 
generalized integers: a domain R, an extension field E > Frac(R) of finite 
degree, and the ring of elements of F that are integral over R. 


Definition. Given a domain R and a finite-degree extension E > Frac(R), the 
ring of elements of £ that are integral over R is called the integral closure of 
R in E and is denoted by Zz;r. 


The word “closure” suggests that Zz/r is “closed” in some sense, and the 
sense is given by the following definition. 


Definition. A domain D is called integrally closed if it equals its own integral 
closure in Frac(D). 


The domain Zz pr is indeed integrally closed. To prove this, we have to 
show that if an x in E is integral over Z¢ gr, then x is already integral over R. 
The proof goes more smoothly if we first establish some equivalent ways of 
expressing the relation “integral over.” 


Equivalents of being integral over R. [fa ring S D ring R and x € S, then 
the following properties of x are equivalent: 


https://doi.org/10.1017/97810090041 38.010 Published online by Cambridge University Press 


8.6 Integral Closure 185 


1. x is integral over R, 
2. R[x] is a finitely generated R-module, 
3. S has a subring Q that is a finitely generated R-module D R[x]. 


Proof. We show thatl!>2>5>3=> 1. 
1 = 2 because an integer x over R satisfies an equation 


x" + ay_x"-! +--+ +ayx +ag=0, where ao,...,dm—1 € R. 


This implies (as previously observed in Section 4.6) that 


3 = a, 4 | aes ae — Gp, 
so x” is in the R-module generated by 1, x,... ,x"—! Tn turn, 
gl ae 4K s+ Sak” = Gos, 
so x"t! is also in the R-module generated by l,x,... sx"! since x” is. 
Similarly, for all other powers xnt2 "+3 so in fact R[x] is the R-module 


generated by l,x,... xed, 


2 => 3 by taking Q to be the R-module generated by 1, x,...,x”7!. 

Finally, we show that 3 => 1 by supposing that there is a finitely generated 
R-module Q > R[x]. Let v1,...,Um be generators of Q. Then, since x € Q, 
each xv; is a linear combination of vj,...,vU,, with coefficients in R, so we 
have the following homogeneous equations in the v;: 


m 
XUj = ba vj, where the ajj € R. 
j=l 


Since these equations have a nonzero solution, their determinant 


aij — Xx a\2 manee Aim 
a2) a22—~X «+s: a2m 
det . . = 0. 
Gm1\ Am2 tt) Amm — x 


Since the highest power of x in the determinant is --x’”, this gives a monic 
polynomial equation for x, so x is integral over R. oO 


Next, it is natural to extend the relation “integral over’ from elements to 
whole rings. 


Definition. If R is a subring of a ring S, we call S integral over R if each x 
in S is integral over R. 


https://doi.org/10.1017/97810090041 38.010 Published online by Cambridge University Press 


186 8 Modules 


Then Zz /p is integrally closed by the following proposition. 


Transitivity of “integral over.” [fT > S D> R are rings such that T is integral 
over S and S is integral over R, then T is integral over R. 


Proof. If x € T we have a polynomial equation 
x™ + Dmx! +++» +b1x + bo =0, where bo, ...,bm—1 € S, 


because x is integral over S. The polynomial has coefficients in 
R[bo, ...,m—1], so x is integral over S’ = R[bo, ...,Dm—1]. 

It follows that Q = S’[x] = R[bo,...,bm—1,x] is a finitely generated 
R-module containing R[x]. 

Then x is integral over R by the theorem above and, since x is an arbitrary 
member of 7, T is integral over R. Oo 


Corollary. Ze/r is integrally closed. 


Proof. Ze rR is integral over R by definition. Now consider some x in E that 
is integral over Zr ;p. It follows from the proof of transitivity that x is integral 
over R. Hence x € Ze pr, by definition of Zz pr. Oo 


This corollary applies in particular to the rings Zz of integers in algebraic 
number fields. Thus, we have now established that Zz is integrally closed as 
well as being Noetherian, which we established in Section 8.4. It remains now 
to establish a property of the prime ideals of Zz, which we study in the next 
chapter. 


Exercises 


1. Revisit exercise | of Section 4.6 to give a direct proof that Z is integrally 
closed. 


8.7 Discussion 


As we have seen in this chapter, the initial motivation for studying modules 
was the theory of algebraic integers and their applications to arithmetic. 
Dedekind (1871) introduced this theory and rapidly covered everything from 
the definition of algebraic integer to finding an integral basis for the ring Zz 
of an algebraic number field of degree n, thereby showing that Zz is (what we 
would call) a free Z-module of rank n. In 1877, disappointed by the lack of 
response to his ideas, he gave an account for a more general audience, which 
is now available in English as Dedekind (1996). 
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In this book, Dedekind began with the concept of module (with the example 
of algebraic integers in mind) and proceeded to study congruence modulo a 
submodule and its congruence classes, hence the quotient of modules. Then he 
specialized to finitely generated modules and their bases, before introducing 
the concepts of algebraic integers and ideals. Algebraic integers are situated 
in algebraic number fields of finite degree, and the discriminant is introduced 
in order to find integral bases for the integers of a field. In particular, finding 
an integral basis by minimizing the discriminant appears in (Dedekind, 1996, 
p. 116). 


8.7.1 Integers over a Domain 


The generalization of Dedekind’s theory to integers over a domain R was 
made by Noether (1926), and brought to a wide audience in Chapter 14 of van 
der Waerden (1931), the pioneering “modern algebra” text. Noether began by 
defining an element x to be integral over R if its powers belong to an R-module 
of finite rank (number 3 of the three equivalents of “integral” in Section 8.6), 
evidently finding this more natural than the definition via monic polynomials. 

She then proceeded to show her definition equivalent to the “usual con- 
ception” of integer, defined by the monic polynomial condition, which had 
prevailed rather awkwardly since the mid-19th century. 

Noether’s theory of integers over a domain includes the concept of integral 
closure, which is the second leg of “an abstract characterization of those rings 
whose ideal theory coincides with that of the integers of an algebraic number 


Figure 8.1 Bartel Leendert van der Waerden (1903-1996) (public domain). 
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Figure 8.2 Heinrich Martin Weber (1842-1913) (public domain). 


field’ The first leg was the Noetherian property of Section 5.4, which she 
introduced in Noether (1921). Thus Noether is looking for the “right concepts” 
to capture unique prime ideal factorization. To complete the list of concepts, 
one needs a condition on the ideals, which we study in the next chapter. That 
condition, too, is motivated by the properties of algebraic number rings. 


8.7.2 Rings of Algebraic Functions 


Noether (1926) also remarked that her theory applies to integers over algebraic 
function fields (“integral algebraic functions”). We have sketched the basic 
ideas of these fields in the exercises to Section 8.5, and will say more about 
them in Section 9.8. 

The extension of Dedekind’s theory of algebraic integers to function fields 
was made by Dedekind and Weber (1882), in a paper now available in English 
translation as Dedekind and Weber (2012). As Noether realized, this extension 
reveals the potentially vast applicability of ideal theory to algebraic geometry. 
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Ideals and Prime Factorization 


Preview 


In previous chapters we have extended the concepts of addition, multiplication, 
rational numbers, and integers to the general setting of rings and fields. This 
setting brings to light the concepts of vector space and module, which clarify 
the nature of many rings and fields by giving them a somewhat geometric 
structure—in particular a concept of “dimension.” 

With this structural understanding of rings, we can now return to the 
problem that provoked all these developments: how to define a notion of 
“prime” for which “unique prime factorization” is true. We hope to have 
this notion at least for the rings Zg of algebraic integers, which serve as a 
microscope for studying the ordinary integers. 

The appropriate concept of “prime” is that of a prime ideal, and the 
rings admitting “unique factorization” of prime ideals are called Dedekind 
domains. We already have all the concepts needed to define Dedekind domains 
and to prove their basic properties, except for the concept of prime ideal itself. 
Defining prime ideals is actually easy to do: it is motivated by Euclid’s result 
that if a prime p divides a product ab, then p divides a or p divides b. 

The concept of Dedekind domain appears complicated at first, involving the 
concepts of integrality, finite-dimensionality, and prime ideals. But in proving 
the existence of unique prime factorization, we invoke another concept — that of 
inverse ideals — which turns out to characterize the Dedekind property exactly 
and concisely. 

In a reasonable sense, this characterization shows that Dedekind domains 
are the “right concept” to explain unique prime factorization. 


189 
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9.1 To Divide Is to Contain 


The simplest ideals that we have seen (in Section 5.2) are the principal ideals, 
which indeed are the only ideals in certain rings. These are the principal ideal 
domains (PIDs), and the first example is Z. A principal ideal is one of the form 


(a) = {ra: r € R}, 


and it closely imitates the element a itself, as far as divisibility is concerned. 
In particular 


bdividesa Sa=bc forsomece R, 
&ra=(re)be (b) foranyr eR, 
 (b) 2 (a). 


So, for principle ideals, we can say that “to divide is to contain.” 
For ideals in arbitrary rings, containment is the definition of “divides.” 


Definition. If J and J are ideals of a ring R, then we say that J divides J if 
ID J. 


With this definition of “divides,” what should the greatest common divisor 
be? Well, first, note that R is itself an ideal that contains all ideals, so R divides 
all ideals, and so it should be the unit ideal. Indeed, R is the principal ideal (1). 
More generally, we have a gcd of any two ideals. 


Greatest common divisor of two ideals. [f I and J are ideals of a ring R, 
then gcdU, J) = {rit+sj:ie1,j € J, andr,s € R}. 


Proof. By definition of divisibility, any ideal L that divides (that is, contains) 
both J and J should divide (that is, contain) their gcd. Thus gcd(/, J) should 
be the “smallest” ideal containing J and J, which means that gcd(J/, J) is the 
intersection of all ideals containing both J and J. 

This set is 


K =({ri+sj:ie1,j € J;r,s € R}. (*) 
It is clear that K is closed under sums, and under products by members of r, so 
K is an ideal. Also, K is the “least” such ideal, because any ideal containing [ 


and J must contain the sums of all their members, and such sums are precisely 
the members of K. Oo 


In the PID Z, this theorem says that 
gcd((a), (b)) = {ma + nb: m,n € Z}. 
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And, as we know from Section 1.2, 
{ma +nb: m,n € Z} = {multiples of gcd(a,b)} = (gcd(a,b)). 


We have seen a more interesting example in Z[V —5], where 


l= {2m + (1+ V—5)n: m,n A 
= [2u + (1+ V—5)v: u,v € Z[v—5]} 


is a nonprincipal ideal. In Section 5.1, we found this ideal by looking for the 
ged of 2 and 1+ /—5 in Z| /—5], and indeed it is the gcd of the principal 
ideals (2) and (1 + /—5) by formula (*). 


Remark. Now that “divides” is defined to mean “contains,” the ascending 
chain condition (ACC) defining Noetherian rings in Section 5.4 says that there 
is no infinite descending chain of divisors. Indeed, Noether (1926) called ACC 
the “divisor chain theorem.” Finiteness of divisor chains is certainly a natural 
assumption when one hopes to prove the existence of prime factorization. It 
is a distant echo of the argument used by Euclid (Section 1.1) to prove the 
existence of prime factorization in N. 
Next, we have to define “prime” for ideals. 


Exercises 
1. Show that the ideal J in the exercises to Section 5.1 equals the gcd of (3) 


and (1+ /—5). 


I and J also have the conjugate ideals: 


T= {2m +(1- J=3)n: mn eZ}, 
T= {3m +(1- V-3)n: m,n ZI 


2. Check that J and J are ideals by verifying their closure under 
multiplication by uz € Z|/—5]. 
. Of which principal ideals is 7 (and similarly J) the gcd? 
4. Since “to divide is to contain,” how should the least common multiple 
(icm) of two ideals be defined? 
5. Test your definition of lcm on the principal ideals (a) and (b) in Z. 


Ww 
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9.2 Prime Ideals 


Before stating the definition of a prime ideal in an arbitrary ring, let us recall 
Euclid’s result about prime divisors p € Z (Section 1.3). 


Prime divisor property. [f prime p divides ab, then p divides a or p 
divides b. Oo 


An equivalent statement in terms of ideals in Z is: 
Ifab € (p), thena € (p) orb € (p). 
This equivalent of the prime divisor property prompts the definition of prime 


ideal: 


Definition. An ideal J ina ring R is prime if, for any a,b € R, ab € I implies 
aéeélorbel. 


Thus, the principal ideal (p) is a prime ideal of Z by the prime divisor 
property. An example of a prime ideal which is not principal is the ideal of 


Z[V=3], 
I= [2m+ (1+ V=3)n: mn eZ}. 


In Section 5.4 we found that J is a maximal ideal, and this implies it is prime, 
because all maximal ideals are prime. This fact can be proved directly, but a 
more enlightening proof deduces it from a characterization of prime ideals like 
the characterization of maximal ideals in Section 5.4. 


Characterization of prime ideals. An ideal I in a ring R is prime if and only 
if R/T is a domain. 


Proof. Observe the following equivalences, where [x] denotes the congruence 
class of x € R, mod /: 


Tis a prime ideal of RS (abe T>aeTlorbel) 
<= ([a][b] = [0] = [a] = [0] or [b] = [0]) 


< R/T has no zero divisors. 


<> R/T is a domain. oO 
Corollary. A maximal ideal is prime. 


Proof. If I is a maximal ideal of R, then R/TJ is a field, by the characterization 
of maximal ideals in Section 5.4. And a field is a domain. oO 


https://doi.org/10.1017/97810090041 38.011 Published online by Cambridge University Press 


9.2 Prime Ideals 193 


One might think that maximality ought to be the same as prime. After all, 
since “to contain is to divide,’ a maximal ideal is one divisible only by itself 
and the whole ring, which is the principal ideal (1). However, the prime divisor 
property is what we really demand of primes, and this is a weaker property 
than maximality. 

Not every prime ideal is maximal, as we can show by finding an example 
where R/J is a domain but not a field. This happens in the polynomial ring 
R = Z[x], where J = (x) is an ideal and R/J = Z (since quotienting by (x) 
amounts to setting x = 0). However, in the case that motivates us — rings Zr 
of algebraic integers — nonzero prime ideals are maximal, as we will see in 
Section 9.4. 

Thus, we are motivated to study rings in which all prime ideals are maximal, 
as well as having the other key properties of rings of algebraic integers: being 
integrally closed and Noetherian. These rings were first singled out by Noether 
(1926), and these particular defining conditions were given by Krull (1928). 


Definition. A Dedekind domain! is a domain that is integrally closed, 
Noetherian, and in which all nonzero prime ideals are maximal. 


Exercises 


A direct proof that maximal ideals are prime is quite short, and it involves an 
argument similar to the proof of the prime divisor property in Section 1.3. 


1. Suppose J is a maximal ideal in R, ab € I, anda ¢ I. Show that the ideal 
{ma+i:m €Z, i € I} = R and, in particular, 1 = ma +i for some 
me Zandiel. 

2. Deduce from exercise | that b € J, and hence that J is a prime ideal. 


Z[x] is also a simple example of a ring that is not a PID (unlike Q[x], or F[x] 
for any field F). 


3. Suppose on the contrary that the ideal {2m + xn: m,n € Z[x]} of Z[x] isa 
principal ideal (p). Deduce that p divides 2 and p divides x, and derive a 
contradiction. 


! These are also known as Dedekind rings, but it seems useful to be reminded that being a 
domain is part of their definition. 
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9.3 Products of Ideals 


We were able to define prime ideals, rather surprisingly, without defining the 
product of ideals. However, we will need this product, and its definition is: 


Definition. The product of ideals A and B in aring R is the set of finite sums 
of products ab, where a € A and b € B. That is 


AB = {a,bh +--+ + nbn: a, ...,an € As bh, ...,bn € B}. 


With this definition, we can restate the prime divisor property for ideals: 


Prime ideal divisor property. [f a prime ideal P divides a product AB of 
ideals, then P. divides A or P. divides B. 


Proof. Suppose that P > AB but that P B A. We wish to prove that P D> B. 

Since P 2 A, there is ana € A witha ¢ P. For any b € B we have 
ab € AB C P,sobe P by definition of prime ideal. This means P > B, as 
required. Oo 


We can in fact take the prime ideal divisor property as the definition of prime 
ideal. If J has the prime ideal divisor property, and if ab € I anda ¢ I, then 
b € I by considering the principal ideals A = (a) and B = (b). 

It might now seem that we are close to proving unique prime factorization 
for ideals. But, not so fast. The trouble is that existence of prime factorization 
is no longer obvious. Existence depends on properties of Dedekind domains, 
starting with the Noetherian property. 


Prime ideal products in Noetherian domains. Jn a Noetherian domain R 
each nonzero ideal contains a product of prime ideals. 


Proof. If there are nonzero ideals of R not containing products of prime ideals, 
then there is a maximal one among them, B, since R is Noetherian. 

B is not prime, by hypothesis, so there are x, y ¢ B with xy € B. Then the 
ideals B + Rx (the sums of elements from B and Rx) and B + Ry (similarly) 
properly contain B so, by the maximal property of B, 


B+RxDP\--- Pm, Bt Ry 2 Q1---Qn, 
for prime ideals Pj,..., Pm, Q1,.--,Qn- 


But now, since xy € B, we have Rxy C B. It follows that (B+ Rx)(B+Ry) C 
B, and therefore 


BD (B+ Rx)(B+ Ry) D Py-++ Py + Q1--+ Qn, 
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contrary to the definition of B. So each nonzero ideal of R contains a product 
of prime ideals. Oo 


We note that, since “to divide is to contain,” the conclusion of the theorem 
says that each nonzero ideal I divides a product P, --- Pm of prime ideals. One 
is tempted to conclude that 


P,---Pn=IJ_ for some ideal J. 


However, we have not yet proved that a “divisor” I of some K is necessarily a 
factor of a product IJ equal to K. This will emerge indirectly in Section 9.6, 
after we introduce “inverse” ideals in Section 9.5. 


Exercises 


With the definition of product of ideals we are finally able to explain why the 
different factorizations of 6 in Z[ /—5], namely 2-3 and (1+ /—5)(1— /—5), 
can be split into the same product of prime ideals. We have already established 
that the ideals 7 and J from Section 5.1 are maximal, and hence prime. 


1. Show similarly that the ideals 7 and J are prime. 


We have also established that 


2. Show that all members of the ideal J? are multiples of 2, and that they 
include 2. Hence, /* = (2). 

3. Show that all members of the ideal JJ are multiples of 3, and that they 
include 3. Hence, J J = (3). 


We now have the prime ideal factorization (2) - (3) = J 27 J, and we can check 
that this is a prime ideal factorization of (1 + ¥v —5) (1 — /—5) as follows. 


4. Show that the principal ideal (1 + /-5) =T1J. 
5. Show similarly that (1 — J/—5) =I J, and finally that J = I, so 


(1+ V—5)(1- V-5) = JT. 
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9.4 Prime Ideals in Algebraic Number Rings 


To show that the theory of Dedekind domains applies to the algebraic integer 
rings Zp, it remains to show that the nonzero prime ideals of Zz are maximal, 
since we know that Zz is Noetherian by Section 8.4.1 and integrally closed by 
Section 8.6. To do this, we begin with a basic lemma about domains. 


Finite domains. Any finite domain is a field. 


Proof. Suppose D is a finite domain with nonzero elements dj, ...,dx. Then 
if d is any nonzero element of D, each dd; # 0 because D has no zero divisors. 
Also, if d; # dj, then dd; # dd; because dd; = dd; implies d(d; — d;) = 0, 
which is impossible because D has no zero divisors. 


It follows that dd,,...,dd, are the nonzero elements of D again, at worst 
in a different order. In particular, some dd; = 1. Thus, each nonzero element 
d € D has an inverse, so D is a field. Oo 


Prime ideals of Zz. If E D Q is a field extension of finite degree, then each 
nonzero prime ideal in Zr is maximal. 


Proof. Suppose P C Zr is a prime ideal. Then Zz/P is a domain by the 
characterization of prime ideals in the previous section. It now suffices to 
prove that the domain Z;/P is finite, and hence a field, since this implies 
P is maximal by the characterization of maximal ideals in Section 5.4. 

To do this, we choose a nonzero x € P and use it to find a nonzero ordinary 
integer a € P. Since x € Zz, we have a minimal equation 


x" + ay_x"! 4---+ayx+ag =0, where ao, ...,dn—1 € Z. 


Notice first that ag # 0, otherwise we could divide the equation by x, contrary 
to minimality. Then rearranging gives 


aj9=Xx (-a1 Sen aan) ; 


which shows that ao is a multiple of x, and hence ag € P. Thus, P includes 
the nonzero ordinary integer ag, which we will call a. 

Now recall from Section 8.2 that Z¢ has a finite integral basis @), ..., @m. 
Since a is in P, its multiples aw), ...,a@@m are also in P. It follows, by adding 
or subtracting such multiples, that each element b}w| +---+Dbm@m € Ze is 
congruent, mod P, to one with |b,|,...,|bm| <a. 

There are only finitely many elements of this form, and therefore Zz /P is 
finite, as required. Oo 
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With this theorem we have proved that the ring of integers Zg in any 
algebraic number field is a Dedekind domain. Thus, the results we now 
prove about Dedekind domains, notably unique prime ideal factorization, are 
guaranteed to apply where number theory wants them. 


Exercises 


1. Using the fact, from the exercises to Section 8.4, that a nonzero ideal J of 
Ze has the same rank as Zz, show that Zz /T is finite for any nonzero J. 

2. Why is the ring Zz /TJ not necessarily a field? 

Find how many elements are in Z| /—5] /(2). 

4, Find elements that show Z| /—5] /(2) is not a field. 


Sa 


9.5 Fractional Ideals 


Our first step in the general theory of Dedekind domains is to expand the 
concept of ideal with an eye towards creating inverse ideals and legitimizing 
cancellation in products of ideals. 


Definition. If R is a domain, then an R-submodule J C Frac(R) is called a 
fractional ideal if there is a nonzero d € R such that dI C R. 


Thus, the elements of J have d as a “common denominator.” Ordinary (or 
“integral’’) ideals are those with d = 1. In any case, since J is an R-submodule 
of Frac(R), d/ is an R-submodule of R, and hence an ideal. If R is Noetherian, 
then d/ is finitely generated, and hence so is J as an R-submodule of Frac(R). 

We use fractional ideals to obtain inverses of ideals in a Dedekind domain. 
An inverse J’ of ideal I satisfies 7’ = R, since R is the identity for multipli- 
cation of ideals (defining the product of fractional ideals the same way as for 
ideals.) 


Inverses of maximal ideals. In a Dedekind domain R that is not a field, each 
maximal M ideal has an inverse M'. 


Proof. Let M be a maximal ideal of R. Then M # (0) since R is not a field. 
Let 


M’ = {x © Frac(R): xM C R}. (*) 


Then M’ is an R-submodule of Frac(R). Any d € M is acommon denominator 
for all x € M’, so M’ is a fractional ideal. It remains to show that MM’ = R. 
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Certainly, the definition (*) implies MM’ C R. Also R C M’ because M is 
an ideal, hence xVM C M C R forall x € R. Since M is maximal and 


MCMM'CR 
we have either MM’ = R or MM’ = M. It now remains to show MM’ # M. 
If, on the contrary, MM’ = M, then 
xeM>xMCmM 
>x7MCM... 
=> x"M C M for any n, by induction 
= any nonzero d € M is a common denominator for all x” 
=> R[x] is a fractional ideal of R. 
Since R is Noetherian, the fractional ideal R[x] is a finitely generated 
R-module by the remark above (following the definition of fractional ideal). 
This means x is integral over R by the equivalents of “integral over” proved 
in Section 8.6. But R is Dedekind, hence integrally closed, so x € R. Thus, 
MM’ = M leads to M’ = R, which we finally have to show is impossible. 
Take a nonzero a € M. The ideal Ra contains a product P| --- Py, of 


nonzero prime ideals by the theorem on prime ideal products in Section 9.2. 
Take m as small as possible and notice that 


M D> Ra D> P,:-- Py => MD some P; 


by the prime ideal divisor property of Section 9.2. Say M D P, by suitable 
renumbering. If we put B = P2--- Py», then 


Ra D> MB and Ra 2 B since m is minimal. 
Thus, there isa b € B with b ¢ Ra. But 
MBC Ra=>MbC Ras>Mba!CR>ba!' eM’, 


by definition of M’. Yet b ¢ Ra implies ba~! ¢ R, so R # M’, as required. 
o 


Notice that this proof of the existence of inverses uses the Dedekind 
domain properties to the full: the Noetherian property, integral closure, and 
the maximality of prime ideals. In Section 9.7 we will show that these are in 
fact the “right” properties to prove existence of inverses, because existence of 
inverse ideals implies the Dedekind domain properties. 


Remark. The existence of an inverse ideal J~! is obvious in the case of a 
principal ideal J = Rd for some nonzero d ¢€ R. In this case it is clear that 
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Rd™' isa fractional ideal and that (Rd)(Rd~') = R, so Rd~! = (Rd)~!. We 
use inverses of principal ideals in the next section to prove, among other things, 
that all nonzero fractional ideals of a Dedekind domain have inverses. 


Exercises 


Knowing the inverses of principal ideals, we can also calculate the inverses of 
certain other ideals. 


1. Describe the inverse ideal (2)~! in both Q and Q(/—5). 

2. Explain why I7(2)~' = Z[./—5], where J = {2m + (1+ /—5): m, 
ne Z}, and hence find J~!, 

3. Similarly find J~', where J = {3m + (1+ /—5): m,n € Z}. 


9.6 Prime Ideal Factorization 


We now combine the existence of a product of prime ideals divisible by a given 
ideal (Section 9.2) with inverses of maximal ideals from the previous section 
to prove existence and uniqueness of prime ideal factorization. It follows that 
the nonzero fractional ideals form an Abelian group. 


Unique prime ideal factorization. Jf R is a Dedekind domain, then each 
nonzero fractional ideal I © Frac(R) is uniquely expressible in the form 


I= PP +. Pim for some prime ideals P\,..., Pm # (0) and ni, ...,2m €Z, 


and the nonzero fractional ideals form an Abelian group under the product 
operation. Also, if I C R, the exponents n; are nonnegative. 


Proof. By definition of the fractional ideal J, dI C R for some nonzerod ¢€ R. 
Since dI, Rd are integral ideals with J = (dI)(Rd)~!, it suffices to prove 
unique prime factorization of integral ideals. We first prove existence of the 
factorization. 

If R contains ideals that are not products of prime ideals, there is a maximal 
one among them, J, since R is Noetherian. Then J # R, since R= MM —! for 
any maximal (and hence prime) ideal M by the theorem in the previous section. 
So J C maximal ideal P, the maximal ideal of R among those containing J. 

The maximal P has an inverse P~!, and P is prime since R is Dedekind. 
Also, 


JOCPS3JP-!c Ppp-'=R, and 
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P'D>R>IP'2DJ,sowehaveJ CIP'CR. 
Now J P~! # J because 
JP'=JandxeP'oxiCcJ 
=>x"J CJ foralln 
=> x integral over R 
=> x € Ras in the previous theorem. 


And this is impossible, because P~! C R= PP~'!CP#R. 
Thus we have J C J P~!, so the maximality of J implies 


JP-= P| --- Pm for some prime ideals P},..., Pin, 
and therefore 
J = PP|--- Pm, contradicting the definition of J. 


So in fact each ideal is the product of prime ideals. Uniqueness follows by the 
prime ideal divisor property of Section 9.2 (just as it did for Z in Section 1.3). 

It also follows that each nonzero fractional ideal has an inverse, obtained by 
reversing the signs of all the exponents in its prime factorization. Then, since 
the product of ideals is obviously associative and commutative, the nonzero 
fractional ideals form an Abelian group. Oo 


Another consequence of the existence of inverses is that if an ideal J divides 
an ideal J, then J = JK for some ideal K. In fact, we have: 


Division implies a product. [f I and J are ideals in R and J~' exists in 
Frac(R), then there is an ideal K © R with I = JK if and only if J D I 
(J divides I). 


Proof. If J > T, the solution K = IJ—! of I = JK isan ideal of R because 


JaTS II STI SRK. 
Conversely, if J = JK for an ideal K C R then J D TI because 


RDIK>JRDIIK SIDI. . 


This divisibility lemma finally gives the answer to the question, raised in 
Section 9.2, whether the concept of “divides” has the right relationship with 
the concept of “product.” 

Moreover, the divisibility lemma enables us to prove that prime ideals do 
not split into nontrivial products of ideals, so the concept of “prime” also has 
the right relationship with the concept of “product.” 
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Indivisibility of prime ideals. /f P is a nonzero prime ideal of a ring R whose 
nonzero ideals have inverses, then P = AB only if A= P or B= P. 


Proof. P = AB implies P > AB, in which case P > A or P D B, by the 
prime ideal divisor property of Section 9.2. 

But P= AB and ACR implies B D5 P by the divisibility lemma. If 
P > B, this gives B = P, and hence A= R by multiplying each side of 
P = AB by B™!. Otherwise P > A, in which case B C R similarly implies 
A=PandB=R. Oo 


Exercises 


Another way in which divisibility of ideals has the “right” relationship with the 
concept of product may be seen with the concepts of greatest common divisor 
and least common multiple. 


1GfA= PP vee a for prime ideals P},..., Px # (0), is an integral ideal 
with positive 11,...,n% € Z, and if C divides A, deduce from 
the divisibility lemma that 


C= ph ee ae for nonnegative 1; <,...,Jk < ng € Z. 


2. Deduce from exercise | that if P,..., P, are the prime ideals in the prime 
ideal factorization of an integral ideal A (as above, but with nonnegative 
exponents n;) and 


B= Pi ad a for nonnegative m,,...,mx € Z, 
then 
min(my, min(m,,nk) 
gcd(A, B) = P 1 eS SP In(mz,Nk : 


3. Prove the corresponding result for least common multiple (lcm) and hence 
show 


gcd(A, B)lcm(A, B) = AB. 


9.7 Invertibility and the Dedekind Property 


In the previous section we showed that any Dedekind domain R has unique 
prime ideal factorization and in Frac(R) there is an inverse for each nonzero 
fractional ideal. We now show that the latter property implies the Dedekind 
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property, so they are actually equivalent, as was discovered by Krull (1935).” 
Invertibility has already given quick answers to questions on the relation 
between divisibility (in the sense of containment) and products of ideals. 
Invertibility also makes short work of the Dedekind properties. 


Invertibility implies the Dedekind properties. /f R is a ring whose nonzero 
ideals have inverses, then R is a Dedekind domain; that is, 


1. it is Noetherian, 
2. it is integrally closed, and 
3. each of its nonzero prime ideals is maximal. 


Proof. First observe that R must be a domain to have the invertibility property, 
because if nonzero a,b € R and ab = O, then the nonzero ideals (a) and (b) 
have product (0), so neither (a) nor (b) can have an inverse. We now see why 
invertibility implies the properties 1, 2, and 3. 

To prove 1, suppose J # (0) is an ideal of R, with inverse J’. Then since 
II' = R, which includes 1, there must be x1,...,%, € J and xj,...,x/, € I’ 
with 


XI1Xy e+ + Xx}, =1. 
Now take any x € J and notice that this equation gives 
xelS>x= («x41 fee + xX) Xn) 
=>xeERxy+---+Rxy, 
because each xx; € IJ’ = R 
=> X1,.-.,Xn generate /. 
Thus, each ideal of R is finitely generated, so R is Noetherian. 


To prove 2, suppose that u is integral over R and that u € Frac(R). Then, 
on the one hand, u satisfies an equation 


x" + ay—x"-! +---+a9=0, whereao,...,dn_1 ER. 


So R+ Ru+---+ Ru"! includes wu", hence also u"*!, u"*2, ..., and indeed 
all of R[u], by the usual argument. On the other hand, u = a/b for some 
a,b € R,so b"—! is acommon denominator of 1,u, ...,u”—! and hence for all 


of R[u]. It follows that R[w] is a nonzero fractional ideal J C Frac(R). It is 


2 However, it should be mentioned the Noether (1926) had already shown unique prime ideal 
factorization equivalent to her formulation of the Dedekind domain properties. 
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also clear, since J consists of the polynomials in u, that JJ = I. Now, if each 
nonzero fractional ideal has an inverse, multiplying 77 = J by an inverse of J 
gives 1 = R. 

In particular, since u € I implies u € R, R is integrally closed. 

To prove 3, suppose that P is a nonzero prime ideal of R. If P is not 
maximal, then there is an ideal Q with P C Q C R. This implies, by the 
divisibility lemma of the previous section, that P = K Q for some ideal K. 
Also, K # R because P # Q. So in the product P = KQ, we have K #R 
and Q # R, contradicting the indivisibility of prime ideals proved at the end 
of the last section. Oo 


Now that we have proved that invertibility implies the Dedekind domain 
properties, which imply unique prime ideal factorization, we could close the 
circle by proving that unique prime ideal factorization implies invertibility. 
Assuming unique prime ideal factorization of fractional ideals, as proved in 
the previous section, the existence of inverses is obvious. As seen there, we 
obtain the inverse of a nonzero fractional ideal by reversing the signs of all the 
exponents in its prime ideal factorization. 

If we assume unique prime ideal factorization only for integral ideals — that 
is, ideals J C R — then it is still possible to show invertibility of nonzero 
ideals. The proof may be seen in (Zariski and Samuel, 1958, p. 273). However, 
the proof involves some tricky manipulations, which we take as a sign that 
the “right” statement of unique prime ideal factorization is the one involving 
fractional ideals. 


Exercises 


Unique prime ideal factorization has become the main focus of this chapter, as 
we searched for the abstract conditions that make it possible. However, we do 
not wish to forget the problems that called for prime ideal factorization in the 
first place. Here is one we can now solve. 

First, recall the example in Section 2.9, where Euler worked in Z[/—2] to 
find all the (ordinary) integer solutions of 


ysx?+2=(x+ —2)(x — V2). 


He began by assuming ged (x + ./—2,x — ./—2) = 1, and we showed in 
Section 2.9 that this assumption is valid when x is odd. 
A similar problem arises with the equation 


yax?+5a(xt+ —5)(x — V—5), 


https://doi.org/10.1017/97810090041 38.011 Published online by Cambridge University Press 


204 9 Ideals and Prime Factorization 


where we would like to know that gcd (x + /—5,x- /-5) = |. This is false 
when x is odd — take x = 5, for example — but we can show it is true when x is 
even. Since Z| /—5]| is not a principal ideal domain, this means showing that 
no prime ideal divides both x + /—5 and x — ./—5, in the sense of dividing 
the ideals (x + ./—5) and (x — /—5). 


1. Suppose P is a prime ideal that divides x + ./—5 and x — /—S. Explain 
why 2x € P and2./—Se P. 

2. Deduce that either 2 € P or else x € P and /—5e P. 

3. If 2 € P, deduce that 2 divides x? + 5 = y°, so y is even and hence x is 
odd. Show that this leads to a contradiction. 

4. If x, /—5 € P then 5 € P (can you see why?) and hence 5 divides x 
(otherwise P = (1), which implies it is not prime). Show that this also 
leads to a contradiction. 


Now, to cut a long story short, we can reach a conclusion like Euler did: if 
ee = x? +5, thenx + ./—5 is acube (a + b J—)° for some a,b € Z. 


5. Show that x + /—5 = (a+b /—5)? for a,b € Z leads to a contradiction, 
and hence that y> = x? + 5 has no solution in ordinary integers. 


9.8 Discussion 


As I said once before, in the introduction to (Dedekind, 1996, p. 3): 


Dedekind’s invention of ideals in the 1870s was a major turning point in the 
development of algebra. His aim was to apply ideals to number theory, but to do 
this he had to build the whole framework of commutative algebra: fields, rings, 
modules and vector spaces. These concepts, together with groups, were to form the 
core of the future abstract algebra. At the same time he created algebraic number 
theory, which became the temporary home of algebra while its core concepts were 
growing up. 


Dedekind’s theory of ideals was reworked by Dedekind himself — for 
example, fractional ideals first appear in Dedekind (1894) — then by Noether 
(1921) and Noether (1926), where the Noetherian and integrality components 
of the Dedekind domain concept are identified. Noether’s ideas were polished 
in van der Waerden (1931), from which they spread widely. After van der 
Waerden, it becomes difficult to trace their continued evolution, which includes 
greater emphasis on the maximality statement? of the Noetherian property, as 


3 In the paper that introduces the term “Noetherian,” Chevalley (1943) uses the same definition of 
this property as Emmy Noether — that each ideal is finitely generated — but he calls it a 
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we have seen in this chapter. The book of Zariski and Samuel (1958) was 
influential, and its offshoot Samuel (1970) was particularly helpful to me in 
preparing the present book. 


9.8.1 Algebraic Function Theory 


In the Preface I mentioned the “second life” of Dedekind domains in algebraic 
geometry. This stems from the paper Dedekind and Weber (1882), which may 
be seen in English translation as Dedekind and Weber (2012). It is impossible 
to do justice to algebraic geometry in a book of this size, but the story 
begins with an analogy between algebraic number fields and algebraic function 
fields that was sketched in the exercises to Section 8.5. The following table 
summarizes the analogy. 


Number Fields Function Fields 

domain Z domain C[x] 

field Q of fractions field C(x) of fractions 

algebraic number x satisfies algebraic function y satisfies 

ay)x" +++++ ag = 0 with any" +---+ao = 0 with 

a0,---,4, EZ a0, ---,4yn € C[x] 

algebraic number field E is algebraic function field F' is 

extension E D Q of finite dimension extension F D> C(x) of finite dimension 
integers of E have a, = | integral functions of F have a, = | 


and they form a Dedekind domain and they form a Dedekind domain 


Before the 19th century, algebraic functions came up mainly in calculus, 
where the problem of integrating them was observed to be difficult — and to be 
connected with problems of algebra and number theory. Even the class C(x) 
of rational functions posed a problem. A rational function f(x) = p(x)/q(x) 
is a quotient of polynomials, which may be integrated by the method of partial 
fractions provided q(x) has a factorization into linear factors. But the existence 
of such a factorization depends on the fundamental theorem of algebra. In fact, 
the problem of integrating rational functions was the main motive for proving 
the fundamental theorem of algebra in the first place. 

When the rational functions are extended by even the square root function 
(an algebraic function y satisfying y* — x = 0), the difficulties multiply. A few 


“maximal condition.” Evidently, by this time, the equivalence of finite generation with existence 
of a maximal element in any set of ideals went without saying. 
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such functions can be integrated by rational change of variable. For example, 
we can find 


1-7 


b king the substituti =: 
y making the substitution x ine 


re 


which “rationalizes” the integral to 


—2dt 
fer 
The reason behind this seemingly miraculous simplification is that the function 
y= 1 — x? satisfies the equation x* + y? = 1 for the unit circle. We are 


able to rationalize the integral because we are able to “rationalize the circle” 
by the parametric equations 


which are those found in our search for Pythagorean triples in Section 2.1. 

There we were interested in rational values of t, but it is easy to see that these 

functions x and y of f satisfy x* + y* = 1 for all complex values of t # +i. 
However, rational substitution fails to work for the integral 


/ dx 
V1—x4 
because it fails to work for the curve 


y=v1—-x4, or yP=1-x*. 


As we saw in the exercises to Section 2.2, the reason for this is also related to 
number theory, namely, a theorem of Fermat (1670) implying that the equation 
y? = 1—x* has no rational number solutions. The proof goes from the formula 
for Pythagorean triples (Section 2.1) to the result that c? = a+ — b* has no 
solution in integers. It then follows by division that y* = 1—.x* has no solution 
in rational numbers. A completely analogous proof works with polynomials in 
place of integers, since polynomials enjoy unique prime factorization, so we 
can conclude that y* = 1 — x* has no solution in rational functions. 

These two examples suggest that difficulties in integration can be traced 
to the nature of curves: particularly the fact that some curves can be param- 
eterized by rational functions and others cannot. The first to really grasp this 
idea was Riemann, around 1850. However, long before this realization dawned, 
Abel (1826b) had proved an amazing theorem measuring the complexity of 
integrals of algebraic functions by a number he called the genus. Abel’s 
theorem is too complicated to explain here, but we can explain one of 
Riemann’s discoveries, which is that genus has a topological interpretation. 
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Figure 9.1 Bernhard Riemann (1826-1866) (public domain). 


9.8.2 Algebraic Curves 


The equations that define algebraic functions (of one variable), 
An(x)y" +++ +a1(x)y+ao(x) = 0, where ao(x)...,dn(x) € C[x], (*) 


are simply polynomial equations in the two variables x, y, and hence they also 
define algebraic curves. To ensure that every polynomial equation has a root, 
it is usual to let x and y take all complex values, so the values of x form the 
plane C. (It is also usual to include the value oo, which closes the plane to a 
sphere, but we will focus on the finite values of x and y for now.) The values 
of y form another copy of this plane. 

Riemann’s great idea was to view the y-plane as a covering of the x-plane, 
with the finitely many values of y that satisfy (*) for a given value xq lying 
“above” xo. Apart from finitely many exceptional values of xo, called branch 
points, the covering in a neighborhood of xo looks like a stack of parallel 
planes, called sheets of the covering. Above a branch point, two or more sheets 
spiral around a common point. The simplest case, shown in Figure 9.2, is 
where the two sheets of the covering for y = ./x meet over x = 0. Over 
every nonzero point x there are two points of the covering, corresponding 
to +./x and —./x. This picture, from Neumann (1865), is misleading in 
one respect. Trying to view the relation between the x-plane and y-plane in 
three dimensions forces us to create a line of intersection of the two sheets. In 
reality, the sheets of the y-plane meet only at the single point y = 0, and the 
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Figure 9.2 Picture of a branch point. From Vorlesungen tiber Riemann’s Theorie 
der Abel’schen Integrale by Carl Neumann, B. G. Teubner, 1865. 


Figure 9.3. Correspondence between the sphere and C U {oo}. 


neighborhood of this point is a disk, like the neighborhood of any other point 
on the y-plane. 

When we include co among the values of x, it is natural to view the 
collection of x-values as a sphere, often called the Riemann sphere. This is 
explained by Figure 9.3, which shows how a sphere is mapped on to a plane 
by projection from its “north pole.” 
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Figure 9.4 Riemann surfaces of genus 1, 2, 3. 


Under this map, points near to the north pole are sent to distant points of the 
plane, so the point oo in the plane naturally corresponds to the north pole itself. 
Thus, if we allow x and y to take the value ov, the relationship (*) between x 
and y becomes a covering of a sphere by a sphere with finitely many branch 
points, called a Riemann surface. For example, the covering for y = ./x now 
has two branch points—the other one at the point x = oo. 

Because sheets fuse at branch points, the surface of y values is not 
necessarily a sphere. In fact, it can also take the forms shown in Figure 9.4. By 
a form, we now mean topological form, where any two surfaces in continuous 
one-to-one correspondence are considered to be of the “same form.” It is 
intuitively plausible, and can be proved rigorously, that the number of “holes” 
in the surface characterizes its topological form. The number of “holes” is 
called the genus because, incredibly, it is the same number as Abel’s genus, 
when his theorem is suitably interpreted as a fact about algebraic functions. 

One of the consequences of this discovery of Riemann is that the curves 
with genus 0 (that is, of spherical form) are precisely those that can be 
parameterized by rational functions. 


9.8.3 Construction of a Riemann Surface 
from a Function Field 


The sudden appearance of Riemann surfaces, and their topology, in the theory 
of algebraic functions, caught 19th-century mathematicians by surprise. They 
had no rigorous framework for handling these concepts, and Riemann himself 
could back up his amazing insights only by intuition, which even included 
appeals to physics. Also, Riemann died before he was 40, so he did not have 
time to reflect, revise, and put his discoveries on a rigorous foundation. 

It fell to Dedekind and Weber to edit Riemann’s collected works in the 
1870s. They determined to build a rigorous foundation for Riemann’s theory, 
by turning Dedekind’s approach to algebraic numbers towards algebraic 
functions. As we have seen at the beginning of this section, there is a strong 
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analogy between algebraic numbers and algebraic functions. But there are also 
special concepts in algebraic function theory with no apparent counterpart in 
algebraic number theory, such as Riemann surfaces. 

One of the triumphs of the Dedekind—Weber theory was to build a Riemann 
surface from an algebraic function field. The idea is this: a point gives values to 
functions. Therefore, an assignment of values to the functions in a field can be 
regarded as a point, provided values are assigned consistently, in the following 
sense: 


e The value given to the constant function c is c. 

- If f and g get values a and b, then f + g gets value a + b. 
- If f and g get values a and b, then f — g gets value a — b. 
- If f and g get values a and b, then f - g gets valuea - b. 

e If f and g get values a and b, then f/g gets value a/b. 


This is a brilliant start, but so far we have only a cloud of points. How can this 
cloud be regarded as a “surface”? In particular, how can it have a genus? 

This problem is solved with the help of branch points. In Riemann’s 
theory, the genus of a surface is completely determined by the nature of its 
branch points (by a formula known as the Riemann-—Hurwitz formula). The 
Dedekind—Weber theory finds the nature of branch points algebraically. For 
example, we can detect the branch point at x = 0 for y = ./x, and the fact 
that it involves two sheets, from the equation y*—x = 0. This makes it possible 
to define genus algebraically. With this definition, Dedekind and Weber were 
able to give a purely algebraic proof of Abel’s theorem. 

There is much more, but perhaps this is enough to give a glimpse of 
the Dedekind—Weber theory of algebraic curves, a theory summed up in the 
History of Algebraic Geometry by (Dieudonné, 1985, p. 29), as follows. 


Dedekind and Weber propose to give algebraic proofs of all of Riemann’s algebraic 
theorems. But their remarkable originality (which in all the history of algebraic 
geometry is only scarcely surpassed by that of Riemann) leads them to introduce a 
series of ideas that will become fundamental in the modern era. 
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quadratic, 61 
integral 


Index 


algebraic function, 184, 188 
basis, 149, 168, 175 
definition, 176 
existence, 176 
closure, 184 
introduced by Noether, 187 
over ring, 183, 184 
is transitive relation, 186 
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definition, 111 Z{i], 40 
Rothe, Heinrich August, 169 
R, 70, 99 
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and well-ordering, 120 nonprincipal ideal, 108, 204 
statement, 116 nonunique prime factorization, 61, 105 
Z[/—2], 45 prime ideal, 192 
Z|/—5], 61 principal ideals, 174 
division property fails, 63 quotients of, 113 


https://doi.org/10.1017/97810090041 38.013 Published online by Cambridge University Press 


https://doi.org/10.1017/97810090041 38.013 Published online by Cambridge University Press 


